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Preface 


This  Lecture  Series  will  present  a  method  and  procedure  for  establishing  a  nondestructive  inspection  programme  with  the 
necessary  reliability  to  ensure  the  probability  of  detecting  anomalies  in  engine  parts.  This  Lecture  Series  is  intended  for  those 
involved  with  production  quality  assurance,  overhaul  of  turbine  engines,  development  of  NDE/NDI  methods,  and  the 
application  of  statistical  methods.  The  material  to  be  presented  is  applicable  to  civil  as  well  as  military  aircraft  and  turbine  engine 
manufacturing  and  maintenance  organizations.  The  Lecture  Series  draws  upon  the  results  of  a  govemment/industry  ten  year 
study  of  NDE/NDI  systems  in  the  United  States.  The  lectures  will  examine  the  detection  capabilities  of  various  NDE/NDI 
methods,  the  statistical  theory  of  quantifying  the  reliability  of  inspections,  the  evaluation  of  inspection  results  in  retirement  for 
cause  decisions,  and  the  procedure  required  to  establish  a  reliable  probability  based  inspection  system.  The  lecturers  will  share 
lessons  learned  in  the  design  of  experiments  to  validate  NDE/NDI  systems  and  in  the  interpretation  of  the  results  of  these 
experiments.  Samples  of  specimens  used  in  NDE/NDI  reliability  programmes  will  be  available  for  inspection  by  attendees.  The 
lecturers  have  actual  experience  in  the  design  and  maintenance  application  of  the  lecture  material.  The  lecture  book  has 
examples  to  help  with  the  understanding  of  design  of  experiments  and  the  statistical  modelling  for  probability  of  detection 
analyses. 


Preface 


Ce  cycle  de  conferences  presentera  une  methode  et  une  procedure  pour  letablissement  d’un  programme  de  controle  non 
destructif  dote  de  la  fiabilite  necessaire  pour  assurer  une  bonne  probabilite  de  detection  de  defauts  des  organes  des  moteurs.  Les 
conferences  sont  destinees  a  tous  ceux  qui  sont  impliques  dans  la  garantie  de  la  qualite  de  fabrication,  la  revision  des  turbines,  le 
developpement  des  procedes  de  contrdle/examen  non  destructif  NDE/NDI,  et  1’application  des  methodes  statistiques. 

Les  matieres  presentees  s’appliquent  aux  aeronefs,  aux  motoristes  et  aux  organisations  de  maintenance  civils  et  militaires.  Les 
conferences  examineront  les  capacites  de  detection  de  defauts  de  differentes  methodes  NDE/NDI,  les  theories  statistiques  de  la 
quantification  de  la  fiabilite  des  controles,  1’evaluation  des  resultats  des  controles  en  vue  de  la  prise  de  decisions  de  retrait  pour 
cause  et  la  procedure  demandee  pour  1’etablissement  d'un  systeme  de  controle  fiable  basee  sur  la  probabilite  de  detection.  Les 
conferenciers  ont  tous  une  experience  pratique  de  la  mise  en  oeuvre  des  principes  exposes,  dans  les  domaines  de  la  conception 
et  de  la  maintenance. 
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1.  SUMMARY 

The  purpose  of  this  document  is  to  provide 
testing  and  evaluation  procedures  for 
assessing  Non -Destructive  Evaluation  (NDE) 
system  capability.  Using  this  ,  an  NDE 
system  can  be  demonstrated  to  meet 
specified  requirements,  and  major  sources  of 
variation  can  be  identified  and  measured. 
Included  in  this  document  is  a  methodology  to 
establish  a  reliable,  quantifiable  probability 
based  inspection  system.  The  NDE 
procedures  addressed  herein  ara  those  used 
to  inspect  gas  turbine  engine  components. 
They  are  applicable  to  airframes  as  well. 

They  are,  specifically,  Eddy  Current  (EC), 
Fluorescent  Penetrant  (PT),  Ultrasonic  (UT), 
and  Magnetic  Particle  (MT),  Testing. 

2.  SYMBOLS/DEFINITIONS 

a  flaw  size.  Actual  physical 

dimension  of  a  flaw;  can  be  its 
depth,  surface  length,  or 
diameter  of  a  circular,  or  radius  of 
semi-circular  or  corner  flaw  having 
the  same  cross-sectional  area. 

S  Measured  response  of  the  NDE 

system,  to  a  flaw  of  flaw  size,  a. 
Units  depend  on  inspection 
apparatus,  and  can  be  scale 
divisions,  counts,  number  of 
contiguous  illuminated  pixels  or 
millivolts. 

a$Q  Flaw  size  at  50%  POD 

i (joe  decision  threshold.  Value  of  i 

above  which  the  signal  is 
interpreted  as  a  hit,  and  below 
which  the  signal  is  interpreted  as 
a  miss.  It  is  the  &  value  associated 
with  50%  POD.  Decision 


threshold  is  always  greater  than  or 
equal  to  inspection  threshold. 

^ sat 

saturation.  Value  of  a  large,  or 

larger  than,  the  maximum  output 
of  the  system  or  the  largest  value 
of  a  that  the  system  can  record. 

^th 

Inspection  threshold.  Value  of  a 

below  which  the  signal  is 
indistinguishable  from  the  noise  or 
the  smallest  value  of  a  that  the 
system  records.  Inspection 
threshold  is  always  less  than  or 
equal  to  decision  threshold. 

00-  01 

Intercept  and  slope  of  the  linear 
relationship  between  Log  a  and 
Log  a 

AAA 

00.  01-5 

Maximum  likelihood  estimators  of 
parameters  Pq>  ji-|  5 

censored 

"data" 

Signal  response  either  smaller 
than  %, ,  and  therefore 
indistinguishable  from  the  noise 
(left  censored),  or  greater  than 
a  sat  (right  censored),  and 
therefore  a  saturated  response 

crack 

A  subset  of  flaws 

A 

d 

A  calculated  flaw  depth  estimated 
from  its  signal  response 

5 

Standard  error  of  residuals  of 
regression  of  Log  S  on  Log  a 

ET 

Eddy  current  testing 

factor 

A  variable  whose  effect  on 

POD(a)  is  to  be  evaluated 
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false  call 

An  NDE  system  response 
interpreted  as  having  detected  a 
flaw  but  associated  with  no  known 
flaw  at  the  inspection  location. 

flaw 

An  undesirable  discontinuity  in  a 
material 

hit 

An  NDE  system  result  interpreted 
as  having  detected  a  flaw 

inspector 

The  person  who  actually  applies 
the  NDE  technique,  interprets  the 
results,  and  determines  the 
acceptance  of  the  material  per  the 
applicable  specifications.  The 
inspector  must  be  certified  to  the 
same  level  required  for  production 
inspectors,  per  MIL-TD-410  or 
SNT-TC-1A,  for  the  NDE 
technique  being  applied. 

MLE 

maximum  likelihood  estimation.  A 
standard  statistical  method  used 
to  estimate  numerical  values  for 
model  parameters,  Pq,  P-j  ,  8,  p, 
and  a  . 

miss 

An  NDE  system  response 
interpreted  as  not  having  detected 
a  flaw. 

MT 

Magnetic  particle  testing. 

NDE 

Nondestructive  evaluation,  which 
encompasses  both  the  inspection 
itself  and  the  subsequent 
statistical  and  engineering 
analyses  of  the  inspection  data 

noise 

Signal  response  containing  no 
useful  flaw  characterization 
information 

POD(a) 

probability  of  detection.  The 
fraction  of  flaws  of  nominal  flaw 
size,  a  ,  which  are  expected  to 
be  detected  (found) 

PT 

Fluorescent  penetrant  testing 

residual 

The  difference  between  an 
observed  signal  response  and  the 
response  predicted  from  the 
model 

system 

operator 

The  person  in  charge  of  an 
automated  or  semi-automated 
system,  and  who  is  responsible 

for  the  mechanical,  electrical, 
computer,  and  other  systems 
being  maintained  in  proper 
operating  condition.  The  system 
operator  should  be  certified  to  the 
same  level  required  for  production 
inspectors,  per  MIL-STD-410  or 
SNT-TC-1A,  for  the  NDE 
technique  being  applied.  In 
general,  however,  the  system 
operator  does  not  function  as  an 
inspector. 

test  monitor  The  person  assigned  to  monitor 
the  system  reliability  testing  per 
this  document,  and  to  assure  that 
all  requirements  of  this 
specification  are  being  met. 

UT  Ultrasonic  testing. 

3.  INTRODUCTION 

With  the  advent  of  the  use  of  damage 
tolerance  philosophies  to  life  engine  hardware, 
either  for  retirement-for-cause  or  for 
consideration  of  inherent  part  defects,  it  has 
become  imperative  to  be  able  to  quantify  the 
probability  of  detection  for  NDE  inspection 
techniques  and  systems.  NDE  systems  are 
classified  into  either  of  two  categories:  those 
which  produce  only  qualitative  information  as 
to  the  presence  or  absence  of  a  flaw,  i.e: 
hit/miss  data,  and  systems  which  also  provide 
some  quantitative  measure  of  the  size  of  the 
indicated  flaw,  i.e:  a  vs.  a  data.  This 
document  will  establish  all  the  necessary 
procedures  to  assess  reliability  of  NDE/NDI 
systems.  It  begins  with  the  basic  general 
requirements  and  then  the  specific 
requirements  for  each  type  of  system.  The 
Appendices  provide  all  the  background 
information  and  equations  necessary  to 
understand  the  derivation  of  the  probability  of 
detection  statistical  analyses 

4.  GENERAL  REQUIREMENTS 

This  section  addresses  the  general 
requirements  for  assessing  the  capability  of  an 
NDE  system  in  terms  of  the  probability  of 
detection  (POD)  as  a  Function  of  flaw  size,  a  . 
These  general  requirements  are  applicable  to 
all  NDE  systems  contained  in  this  document 
and  address  the  demonstrator  responsibilities 
and  the  requirements  for  planning,  conducting, 
analyzing,  and  reporting  NDE  reliability 
evaluations.  Specific  requirements  that 
pertain  to  Eddy  Current  (ET),  Fluorescent 
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Penetrant  (PT).  Ultrasonic  (UT),  and  Magnetic 
Particle  (MT)  inspection  systems  are 
contained  in  Section  5. 

4.1  RESPONSIBILITIES 

The  ultimate  responsibility  for  ensuring  the 
accuracy  of  the  test/demonstration  is  the 
demonstrator.  It  is  his  responsibility  to  ensure 
that  the  requirements  of  this  document  have 
been  met  and  that  variances  and 
discrepancies  are  noted  and  understood. 

4.2  SYSTEM  DEFINITION  AND 
CONTROL 

The  NDE  system  must  be  precisely  defined  to 
be  evaluated  in  terms  of  the  limits  of 
operational  parameters  and  range  of 
application  and  must  demonstrate  that  the 
system  is  in  control.  In  addition  to  the  physical 
attributes  of  the  NDE  system,  this  may  include 
planned  statistical  assessments  of  those 
compor  3nts  responsible  for  system  variability. 

4.3  DEMONSTRATION  DESIGN 

To  ensure  that  the  assessment  of  the  NDE 
system  is  complete,  the  demonstrator  will 
develop  and  submit  for  approval  (to  the  office 
of  responsibility)  a  Demonstration  Design 
Document  or  in  laymens  terms  a  test  plan 
which  specifies  the  experimental  design  for 
the  inspections;  the  method  of  obtaining  and 
maintaining  the  structural  specimens  to  be 
inspected;  the  procedures  for  performing  the 
inspections;  and  the  process  for  ensuring  the 
inspection  system  is  under  control.  The  topics 
to  be  addressed  in  each  of  these  areas 
include  the  following, 

4.3.1  Experimental  Design 

The  prime  objective  of  an  NDE  reliability 
demonstration  is  to  determine  the  POD  versus 
flaw  size  relationship  which  defines  the 
capability  of  an  NDE  system  under 
representative  application  conditions.  Variation 
in  NDE  system  response  (and,  hence, 
uncertainty  in  detectability)  is  caused  by  both 
the  physical  attributes  of  a  flaw  and  the  NDE 
process  variables  or  parameters.  The 
uncertainty  caused  by  differences  between 
flaws  is  accounted  for  by  using  representative 
specimens  with  flaws  of  known  size  in  the 
demonstration  inspections  (Subsection  4.2.2). 
The  uncertainty  caused  by  the  NDE  process  is 
accounted  for  by  a  test  matrix  of  different 
inspections  to  be  performed  on  the  complete 


set  of  specimens.  If  the  experiment  is  property 
designed  and  executed,  a  secondary  objective 
of  identifying  those  factors  which  significantly 
influence  POD  for  the  system  can  also  be  met 

The  experimental  design  defines  the 
conditions  related  to  the  NDE  process 
parameters  under  which  the  demonstration 
inspections  will  be  performed.  In  particular, 
the  experimental  design  comprises; 

1 .  The  identification  of  the  process 
variables  which  may  influence  flaw 
detectability  but  cannot  be  precisely 
controlled  in  the  real  inspection 
environment; 

2.  The  specification  of  a  matrix  of 
inspection  conditions  which  fairly 
represents  the  real  inspection 
environment  by  accounting  for  the 
influencing  variables  in  a  manner 
which  permits  valid  analyses; 

3.  The  order  for  performing  the  individual 
inspections  of  the  test  matrix.  (The 
number  of  flawed  and  unfiawed 
inspection  sites  in  the  experiment 
could  also  be  considered  as  part  of 
the  experimental  design,  and  this  topic 
is  addressed  in  Subsection  4.3.2. 1 

Although  general  guidelines  for  these  areas 
are  presented  in  the  following  subsections,  it  is 
recommended  that  a  qualified  statistician 
participate  in  the  preparation  of  the 
experimer.al  design. 

4.3.1. 1  Test  variables 

It  is  assumed  that  the  inspection  process  has 
been  defined  and  is  under  control  for  the 
demonstration  testing.  Even  so,  there  will  be 
factors  which  cannot  be  completely  controlled 
or  can  only  be  controlled  within  reasonable 
operational  limits.  To  evaluate  the  inspection 
system  in  the  application  environment,  these 
factors  must  be  identified  so  that  they  can  be 
fairly  represented  in  the  demonstration  tests. 
For  example,  in  a  manual  inspection,  it  would 
not  be  acceptable  to  use  only  the  known  best 
inspector  in  the  demonstration  tests.  Rather, 
the  entire  population  of  inspectors  must  be 
represented,  as  is  discussed  in  subsection 
4. 3.1.2. 

The  demonstrator  will  generate  a  list  of 
process  variables  which  can  be  expected  to 
influence  the  efficacy  of  the  NDE  system. 
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This  list  will  provide  the  basis  for  generating 
the  evaluation  test  matrix.  To  assure  a 
thorough  evaluation,  it  is  recommended  that 
the  initial  matrix  include  as  many  variables  as 
possible.  If  early  in  the  test  program  it  is 
demonstrated  tnat  a  particular  variable  is  not 
significant,  it  may  be  eliminated  from  further 
consideration,  thus  resulting  in  a  revised, 
smaller  test  matrix.  To  be  eliminated,  i‘ must 
be  shown  that  the  variable  has  no  significant 
effect  on  POD  using  the  analysis  methods  of 
Appendices  C  and  D.  The  office  of 
responsibility  reserves  the  r,  jht  to  expand  or 
reduce  the  list  of  variables  to  be  included  in 
the  test  matrix. 

As  a  minimum,  the  following  types  of  variables 
will  be  considered  in  generating  the  list  of  test 
variables: 

1 .  Part  Preprocessing.  This  variable  type 
includes  factors  such  as  pad  cleaning, 
preparation,  contour,  and  surface 
condition.  It  could  also  include  such 
things  as  the  application  of  the 
penetrant  for  fluorescent  penetrant 
readers.  Early  in  the  definition  of  the 
system  acceptance  test  plan,  a 
decision  must  be  made  as  to  how  far 
upstream  the  requirements  should 
extend.  For  a  penetrant  reading 
system,  it  may  be  determined  not  to 
consider  the  penetrant  application  as 
a  variable  and  every  effort  should  be 
made  to  hold  that  as  a  constant  for 
all  systems  being  compared.  If, 
however,  a  new  system  is  being 
evaluated  specifically  because  it  may 
be  less  sensitive  to  pre-processing 
variables,  these  variables  should  be 
included  in  the  test  plan.  The  range  of 
the  variables  to  be  considered  in  this 
case  should  be  those  allowed  by  the 
procedures  used  at  the  application 
site. 

2.  Inspector:  In  many  applications  the 
human  conducting  the  inspection  is 
the  most  significant  variable  in  the 
process.  Conversely,  some  inspection 
systems  have  been  demonstrated  to 
be  very  inspector-independent.  The 
test  plan  should  include  the  inspection 
results  obtained  by  several  operators 
selected  at  random  from  among  the 
population  eligible  to  conduct  the 
inspection.  Eligibility  may  be  defined  in 
terms  of  a  particular  certification, 
training  or  physical  ability. 


3.  Inspection  Materials:  Particular 
chemicals,  concentrations,  particle 
sizes,  and  such  .nay  be  used  in  a 
given  inspection.  Fur  example,  PT 
inspections  will  use  penetrants, 
emulsifiers  and  developers,  each  of 
which  may  have  a  significant  impact 
on  inspection  capability.  System 
evaluation  must  be  conducted 
considering  the  range  of  materials 
expected  to  be  used  in  production  If 
different  penetrants,  for  example,  may 
be  used,  penetrant  should  be 
considered  as  a  variable  in  defining 
the  test  matrix.  I*  the  operating 
procedures  for  the  svstem  preclude 
the  use  of  other  penetants,  they  need 
not  be  included,  but  this  clearly  limits 
the  generality  of  the  system 
assessment. 

4  Sensor.  If  the  sensor  used  in  the 
inspection  system  is  replaceable,  or  if 
different  sensors  may  be  used  for 
different  applications  of  the  system 
such  as  is  the  case  for  eddy  current  or 
ultrasonic  inspections,  sensors  also 
must  be  a  variable  in  the  test  matrix. 
The  sensors  used  in  the 
demonstration  tests  must  be  selected 
at  random  from  a  production  lot. 
Sensor  designs  typical  of  each 
planned  for  use  with  the  system 
should  be  included  in  the  test  plan, 
with  several  of  each  being  evaluated. 

5.  Inspection  Setup  (Calibration): 
Electronic  inspection  processes  in 
particular  require  instrumentation 
adjustments  to  assure  the  same 
sensitivity  inspection  independent  of 
time  or  place.  To  evaluate  the 
potential  variation  introduced  to  the 
inspection  process  by  this  calibration 
operation,  the  test  matrix  should 
include  calibration  repetitions,  allowing 
random  variations  that  are  consistent 
with  the  process  instructions.  If  mure 
than  one  calibration  standard  is 
available  (eg:  production  sets),  the 
effect  of  the  variation  between 
standards  should  also  be  considered 
as  a  test  variable  by  repeating  the 
specimen  inspection  after  calibrating 
on  each  of  the  available  standards. 

6.  Inspection  Process:  The  inspection 
process  specifies  controls  on  such 
inspection  parameters  as  dwell  time, 


current  direction,  scan  rates,  and  scan 
path  index.  The  system  test  matrix 
should  include  evaluation  of  these 
parameters.  If  an  allowable  range  is 
specified,  the  test  plan  should 
evaluate  the  inspection  at  the  extreme 
of  this  range.  If  the  parameter  is 
automatically  to  be  held  constant, 
repetitions  of  the  basic  inspection  may 
be  sufficient  evaluation  of  this 
variable. 

4.3.1. 2  Tes..  matrix 

The  demonstrator  will  generate  a  test  matrix  to 
be  used  in  the  reliability  demonstration.  The 
test  matrix  is  a  list  of  planned  process  test 
conditions  which  collectively  define  one  or 
more  experiments  for  assessing  NDE  system 
capability.  A  process  test  condition  is  defined 
as  a  set  of  specific  values  for  each  of  the 
process  variables  deemed  significant  (see 
Appendix  A).  The  complete  set  of  test 
specimens  would  be  inspected  at  each  test 
condition  of  the  test  matrix.  The  complete 
matrix  can  comprise  more  than  one 
experiment  to  allow  for  preliminary  evaluation 
of  variables  which  may  only  marginally 
influence  inspection  response  of  the  system. 
To  the  extent  possible,  the  individual 
inspections  of  a  single  experiment  should  be 
performed  in  a  random  order  to  minimize  the 
effect  of  all  uncontrolled  factors  which  may 
influence  the  inspection  results. 

The  inspection  test  conditions  are  to  be 
representative  of  those  that  will  be  present  at 
the  time  of  a  future  inspection.  Therefore,  to 
eliminate  potential  bias,  the  values  assigned 
to  each  test  variable  in  a  test  condition  must 
be  selected  at  random  from  the  popula'ion  of 
possible  values  for  that  variable.  For  example, 
if  a  future  inspection  is  to  be  performed  by  any 
of  a  given  population  of  inspectors  and  three 
inspectors  are  to  be  included  in  ihe 
experiment,  then  the  three  inspectors  should 
be  chosen  at  random  from  the  population. 
Similarly,  if  two  different  probes  of  identical 
design  are  to  be  used  in  the  experiment,  they 
should  be  selected  at  random  from  the 
population  of  probes.  Note,  that  if  the 
population  of  probes  (or  inspectors)  includes 
those  r.vjt  yet  available,  it  must  be  assumed 
that  the  available  probes  (or  inspectors)  are 
representative  of  those  that  may  be  obtained 
in  the  future. 

The  analysis  methods  for  combining  multiple 
inspections  in  the  calculation  of  a  single 


POD(a)  function  with  confidence  limits 
requires  that  the  levels  of  all  of  the  variables 
be  balanced.  This  is  most  easily  achieved 
when  the  test  matrix  comprises  a  full  factorial 
experiment  in  wh:ch  all  combinations  of  all 
levels  of  the  variables  are  in  the  test  matrix.  It 
is  readily  apparent  that  factorial  experiments 
can  rapidly  lead  to  very  large  test  matrices. 
There  are  other  methods  of  designing 
balanced  experiments  in  the  statistical 
literature  which  do  not  require  all  combinations 
of  the  levels  of  the  variables  (cf.  Appendix  A, 
and  Box,  Hunter,  and  Hunter  (1978)).  These 
can  and  should  be  employed  when  necessary. 

In  general,  a  final  test  matrix  is  a  compromise 
between  the  number  of  variables  that  can  be 
included,  the  number  of  levels  (values;  for 
each  of  the  variables,  and  the  available  time 
and  money.  To  ensure  that  all  desired 
objectives  of  the  demonstralion  can  be  met,  it 
is  imperative  that  all  trade-offs  be  evaluated 
before  inspections  begin 

It  should  also  be  noted  that  experiments  to 
evaluate  the  effects  of  inspection  process 
parameters  on  POD  can  be  designed  and 
analyzed  using  the  methods  of  appendices  A, 
C,  and  D.  Such  experiments  should  be 
performed  prior  to  the  capability  demonstration 
as  a  planned  approach  to  optimizing  the 
process. 

4.3.2  Test  Specimens 

The  test  specimens  must  reflect  the  structural 
types  that  the  NDE  process  will  see  in 
application  with  respect  to  geometry,  material, 
part  processing,  surface  condition,  and,  to  the 
extent  possible,  flaw  characteristics.  Since  a 
single  NDE  process  may  be  used  on  several 
structural  types,  multiple  specimen  sets  may 
be  required  in  a  reliability  assessment.  The 
demonstrator  will  determine  the  characteristics 
of  the  test  specimens  required  for  the 
demonstration  and  recommend  the  required 
number  of  flawed  and  unflawed  specimens. 

All  test  specimens  available  to  the 
demonstrator  will  be  evaluated  to  determine  if 
existing  test  sets  meet  the  requirements  of  the 
reliability  demonstration.  The  demonstrator 
will  insure  that  the  specimens  will  not  be 
familiar  to  the  inspectors.  Specimens  which 
have  become  familiar  to  the  inspectors  will 
bias  the  resulting  POD(a)  curves  and  so  will 
be  considered  as  unsuitable  for  reliability 
demonstration.  When  necessary,  new 
specimen  sets  will  be  designed  and  fabricated 
to  meet  the  requirements.  A  plan  for 
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maintaining  and  re-validating  the  specimens 
will  be  established.  All  of  these  results  will  be 
documented  in  the  Demonstration  Design 
Document.  The  following  subsections  present 
minimum  considerations  in  obtaining  and 
maintaining  the  demonstration  test  sets. 
Further  guidelines  for  fabricating, 
documenting,  and  maintaining  test  specimens 
are  presented  in  Appendix  B. 

4.3.2. 1  Flaw  sizes  and  number  of  flawed 
and  unflawed  inspection  sites 

The  statistical  precision  of  the  estimated 
POD(a)  function  depends  on  the  number  of 
inspection  sites  with  flaws,  the  size  of  the 
flaws  at  the  inspection  sites,  and  the  basic 
nature  of  the  inspection  result  (hit/miss  or 
magnitude  of  signal  response).  Unflawed 
inspection  sites  are  necessary  in  the  specimen 
set  to  insure  integrity  and  to  estimate  the  rate 
of  false  indications.  Regarding  these  topics, 
the  following  recommendations  are  made: 

1 .  The  flaw  sizes  should  be  uniformly 
distributed  on  a  log  scale  covering  the 
expected  range  of  increase  of  the 
POD(a)  function.  Cracks  which  are 
so  large  that  they  are  always  found  (or 
saturate  the  recording  device)  or  so 
small  that  they  are  always  missed  (or 
yield  a  signal  which  is  obscured  by  the 
system  noise)  provide  only  limited 
information  concerning  the  POD(a) 
function.  Since  the  region  of  increase 
of  the  POD(a)  function  is  initially 
unknown,  only  engineering  judgement 
can  be  made  regarding  this  range  of 
increase.  It  should  be  noted  that  there 
is  a  tendency  to  include  too  many 
"large"  flaws  in  NDE  reliability 
demonstrations. 

2.  To  provide  reasonable  precision  in  the 
estimates  of  the  POD(a)  function, 
experience  suggests  that  the 
specimen  test  set  contain  at  least  60 
flawed  sites  if  the  system  provides 
only  hit/miss  results  and  at  least  40 
flawed  sites  if  the  system  provides  a 
quantitative  response,  &  ,  to  a  flaw. 

3.  To  allow  for  an  estimate  of  the  false 
call  rate,  it  is  recommended  that  the 
specimen  set  should  contain  at  least 
three  times  as  many  unflawed 
inspection  sites  as  flawed  sites.  An 
unflawed  inspection  site  need  not 
necessarily  be  a  separate  specimen. 

If  a  specimen  presents  several 


locations  which  might  contain  flaws, 
each  location  may  be  considered  an 
inspection  site.  To  be  considered  as 
such  the  sites  must  be  independent, 
that  is,  knowledge  of  the  presence  or 
absence  of  a  flaw  at  a  particular  site 
must  have  no  influence  on  the 
inspection  outcome  at  another  site.  It 
is  advisable  to  have  at  least  10-20 
unflawed  specimens  for  FPI  testing. 

4. 3.2.2  Physical  characteristics  of  the 
test  specimens 

The  final  geometry  of  the  specimen  shall 
represent  to  the  NDE  method  to  be  used  the 
same  degree  of  difficulty  as  the  critical  areas 
of  the  components  to  be  inspected. 

Specimens  must  represent  the  shapes  of  the 
actual  hardware  for  inspections  where  probe 
manipulation  and/or  inspection  media  (  such 
as  magnetic  field,  sound  waves,  line  of  sight ) 
are  geometry  dependent.  Bolt  holes,  flat 
surfaces,  fillets,  radii,  and  scallops  are  some 
typical  shapes  that  influence  inspections. 
Fvesidual  stress  may  influence  the  inspection 
due  to  configuration.  Another  geometric 
consideration  for  all  inspection  techniques  is 
flaw  location,  for  example  corner  flaws  versus 
surface  cracks.  Flaw  location  on  specimens 
must  be  oriented  and  positioned  to  represent 
actual  parts.  The  initial  geometry  of  the 
specimen  shall  allow  the  insertion  of  flaws  of 
the  required  shape  and  size  in  the  specified 
locations.  The  specimen  shall  be  designed 
such  that  the  required  flaws  can  be  inserted, 
and  then  the  final  geometry  can  be  obtained 
by  machining  or  other  forming  methods  that 
will  also  retain  the  flaws  of  the  necessary  size, 
shape,  orientation  and  within  0.002  inches  of 
the  intended  locations.  Specimens  should  be 
manufactured  to  tolerances  typical  of  the 
component  they  represent. 

For  ultrasonic,  eddy  current  and  magnetic 
particle  methods,  the  demonstrator  shall 
select  the  same  alloy,  material  form  and 
processing  as  the  components  to  be 
inspected.  For  example,  if  an  actual  part  is 
made  of  INC0  718,  forged  to  nearfinished 
shape,  the  specimen  should  be  made  of 
INCO  718  and  fabricated  by  the  same 
processes.  In  addition,  for  ultrasonic 
inspection,  the  internal  noise  and  attenuation 
shall  be  as  defined  by  the  statement  of  work 
for  the  components  to  be  inspected.  For 
magnetic  particle  inspection,  the  magnetic 
properties  shall  be  comparable  to  the 
components  to  be  inspected. 
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The  processing  (forged,  cast,  or  extruded)  of 
the  raw  material  and  the  heat  treat  are  critical 
to  insure  that  the  specimen  simulates  the 
same  metallurgical  properties  as  the  actual 
part.  Since  the  surface  condition  of  the 
specimen  can  significantly  affect  this 
detectability,  the  specimen  surface  condition 
should  simulate  that  of  the  parts  to  be 
inspected.  Surface  condition  of  the  final 
product  and  specimen  will  influence  all 
inspection  signal  to  noise  ratios.  Some 
examples  are  as  follows.  Grain  size  can  have 
a  large  influence  on  signal  to  noise  ratio  for 
ET  and  UT,  and  magnetic  field  for  MT. 
Processing  also  can  develop  mechanical 
properties  which  can  influence  PT  results. 
Material  strength  can  influence  the  amount  of 
smear  metal  which  can  obscure  defects  from 
penetrant  inspection  and  residual  compressive 
stress  may  influence  PT  or  UT.  Residual 
stresses  can  also  be  influenced  by  flaw 
propagation  (flaws  grow  to  relieve  the  stress 
field  in  which  they  reside)  and  final  machining. 
Final  machining  of  the  specimen  should  be 
consistent  with  final  machining  of  the  part.  The 
surface  finish  of  the  specimen  and  actual  part 
should  be  consistent  so  that  the  common 
surface  finish  between  specimen  and  part 
provide  similar  signal  responses.  For 
example,  if  the  part  is  turned  on  a  lathe,  the 
specimen  should  be  turned  on  a  lathe 
whenever  possible.  If  the  surface  texture  of 
the  part  and  specimen  are  not  similar,  for 
instance  “record  groove"  finish  on  the  part  due 
to  lathe  turning  and  ground  finish  on  the 
specimen  from  grinding,  the  false  call  rate 
may  be  higher  on  the  parts  due  to  the  macro 
finish  of  record  groove  even  though  the  micro 
surface  finishes  are  similar.  This  can  be 
accounted  for  later  by  using  real  parts.  If  this 
surface  condition  is  not  known,  the  specimens 
may  be  made  with  a  very  good  surface  finish, 
and  inspection  of  the  typical  production 
components  may  be  used  to  evaluate  the 
expected  noise. 


4. 3.2.3  Specimen  Maintenance 

The  demonstrator  shall  derive  a  plan  for 
protecting  the  specimens  from  mechanical 
damage  and  contamination  that  would  alter 
the  response  of  the  NDE  process  for  which 
they  are  used.  This  plan  would  require  as  a 
minimum  that  the  specimens  would  be: 

1 .  Individually  packaged  in  protective 
enclosures  when  not  in  use; 


2.  Carefully  handled  when  in  use; 

3.  Cleaned  immediately  and  returned  to 
the  protective  enclosure  after  each 
use; 

4.  Re-validated  at  intervals  specified  by 
the  contracting  agency  when  the 
specimens  are  intended  for  periodic 
usage. 

Specimen  flaw  responses  should  be 
measured  periodically  by  an  independent 
agency  using  the  same  test  technique  and 
procedure  used  in  the  original  specimen 
verification  ( see  Appendix  B ).  The  flaw 
response  must  fall  within  the  range  of  the 
responses  measured  in  the  original  verification 
process.  If  it  does  not,  the  results  must  be 
examined  to  consider  if  they  are  acceptable,  if 
the  specimen  has  been  unacceptably 
compromised,  or  if  the  specimen  needs  to  be 
recharacterized  and  verified. 

When  multiple  specimen  sets  are  required  for 
periodic  use,  the  demonstrator  shall  initially 
select  one  set  as  a  master  set.  The  remaining 
sets  shall  be  demonstrated  to  have  a 
response  within  a  specified  tolerance  of  the 
master  set.  Periodic  re-verification  against 
the  master  set  can  then  be  performed. 

4.3.2.4  Engine  hardware  specimens 

Note  that  in  many  cases  when  a  development 
system  is  first  being  evaluated,  the  specific 
part  geometries  and  surface  conditions  may 
not  be  known,  or  if  known,  representative 
flawed  specimens  may  not  be  available.  This 
emphasizes  the  necessity  for  the  inspection  of 
actual  engine  hardware  as  a  part  of  the 
qualification  program.  Again,  these  may  not 
reflect  exactly  the  conditions  to  be  seen  in  the 
specific  application  of  the  system,  but  they  will 
be  significantly  more  realistic  than  just  the 
laboratory  flawed  specimens.  The  engine 
parts  should  also  have  defects  in  them  to 
provide  signals  for  the  inspection.  For  ET  and 
MT  systems,  EDM  notches  may  be 
sufficient  for  evaluating  scan  plan  coverage 
but  will  be  inadequate  to  assess  system 
response  to  actual  fatigue  flaws.  For  UT  , 
drilled  holes  may  be  preferable,  for  PT  , 
fluorescent  markings  may  be  the  best 
available,  though  they  may  be  too  bright  to 
verify  system  capabilities.  An  ideal  test  would 
use  actual  service  flawed  hardware,  if  a 
representative  selection  of  such  parts  can  be 
collected. 


4.3.3  Test  Procedures 

The  demonstrator  will  develop  and  report  a 
detailed  plan  for  executing  the  demonstration 
tests  at  the  application  facility.  The  procedures 
to  be  used  in  the  demonstration  must  follow 
the  procedures  and  work  instructions  planned 
for  the  production  inspection  of  parts.  This 
includes  all  fixed  process  parameters,  data 
analysis  algorithms  ( for  automated  systems ), 
accept  /  reject  criteria  and  other  items  covered 
by  the  System  Configuration  Control 
Document.  The  System  Configuration  Control 
Document  contains  information  to  govern  the 
system  configuration  such  that  a  stable 
baseline  is  established.  The  inspections 
should  be  performed  by  production  inspectors, 
as  designated  by  the  experimental  design.  A 
test  monitor  should  be  designated  who  will 
assure  that  all  requirements  of  this  paper  are 
being  met  both  prior  to  initiation  and  during  the 
performance  of  the  tests.  Every  inspection 
technology  depends  on  certain  conditions 
being  met  that  the  operator  may  not  be  able  to 
verify  as  a  part  of  the  daily  inspection  setup. 
Examples  of  this  may  include  the  scan  speed 
or  index  of  mechanical  manipulators,  the  drive 
frequencies  of  eddy  current  or  ultrasonic 
instruments,  or  the  purity  of  chemicals  or 
solutions  being  used.  Prior  to  the  NDE  system 
evaluation,  it  is  important  that  significant 
variables  such  as  these  be  calibrated.  It  is 
suggested  that  this  be  done  using  NIST 
traceable  standards  and  procedures.  Note  that 
any  non-conformance  that  is  not  corrected  will 
likely  degrade  the  NDE  system  performance. 
Periodic  recalibration  of  the  NDE  system  after 
acceptance  should  be  conducted  in 
accordance  with  local  procedures. 

In  addition  to  specific  requirements  of  the  NDE 
process  ( Section  5 ) ,  the  following  must  be 
considered  in  the  development  of  the  test 
procedure  plan: 

1 .  System  software  controlling  any  data 
collection,  reduction,  and  processing 
must  be  that  planned  for  use  in 
production  implementation.  Any 
differences  between  the  test  and 
reality  could  negate  the  ability  of  the 
POD  curve  to  be  applied  to  the  actual 
testing  situation 

2.  Appropriate  fixturing  of  specimens  can 
make  the  inspection  procedure  similar 
to  actual  parts;  that  is,  the 
demonstration  fixturing  and  the  actual 
component  would  ideally  have  the 
same  inspection  system  arrangement 


of  probe,  orientation,  manipulation, 
and  scan  plan. 

3.  Signal  evaluation  and  decision  levels 
used  during  the  testing  should  be 
those  planned  for  use  in  production.  In 
many  cases  it  may  not  be  known  in 
advance  what  thresholds  can  be 
practically  implemented  in  production, 
in  such  a  situation  the  detection 
capabilities  should  be  established  as  a 
function  of  these  process  parameters. 

4.  Scanning  motions  for  the  test 
demonstration  should  be  similar  to 
those  planned  for  production.  This 
similarity  should  extend  to  the 
manipulator  axes  used,  feeds  and 
speeds,  alignment  routines  (such  as 
eddy  current  bolthole  probe 
centering),  and  scanning  procedures. 
This  may  not  be  strictly  possible  for 
the  inspection  of  some  of  the  LCF 
specimens,  but  every  effort  to  achieve 
similarity  should  be  made. 

5.  Accurate  data  acquisition,  recording, 
and  documentation  is  also  important. 
The  data  should  be  recorded  in  the 
form  which  is  compatible  with  the 
disposition  of  the  part.  For  example, 
an  eddy  current  inspection  may  record 
the  data  as  voltage  output  of  signal  3 
or  a  signal-processed  calculated 

A 

"depth",  d.  If  the  part  is  to  be 

A 

rejected  by  d,  (which  is  not  a 
recommended  practice)  but  the 
demonstration  data  were  recorded 
and  analyzed  in  3,  the  reject  standard 
separating  good  from  bad  parts  would 
necessarily  be  in  terms  of  3. 
Therefore,  the  reject  level  for  actual 
parts  would  be  unknown,  because  3 

A 

cannot  be  easily  converted  to  d. 
which  is  based  on  some  signal 
processing  algorithm  rather  than  the 
mandatory  break-open  data  for 
specific  geometries  and  stress  fields. 
The  test  would  then  have  to  be 

A 

repeated  and  the  appropriate  data,  d 
in  this  example,  collected  and  then 
reanalyzed  in  the  appropriate  metric, 

A 

d.  Proper  planning  prior  to  data 
collection  will  avoid  such  difficulties 
and  provide  meaningful  results  the 
first  time. 
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4.3.4  Demonstration  Process  Control 

The  demonstrator  will  develop  a  plan  for 
insuring  that  the  NDE  process  is  in  a  state  of 
control  at  the  start  of  the  demonstration  and 
remains  in  the  state  of  control  throughout  the 
demonstration  period,  regardless  of  length  of 
time.  The  plan  will  include  routine  quality, 
instrumentation,  and  calibration  checks,  and 
should  also  incorporate  inspection  responses 
to  real  structure  or  specimens.  The  process 
control  plan  should  be  the  basis  for  process 
control  during  extended  periods  of  production 
inspections  using  the  system  (Section  4.2). 

4.4  DEMONSTRATION  TESTS 

The  sets  of  inspections  as  defined  in  the 
Demonstration  Design  Document  will  be 
carried  out  at  the  production  inspection  facility 
under  normal  operational  conditions.  The  test 
monitor  will  be  available  during  all  testing. 
Inspectors  will  inspect  all  specimens  in 
accordance  with  the  Demonstration  Design 
Document,  the  matrix  of  test  variables,  the 
applicable  NDE  process  specifications,  and 
any  work  instructions  deemed  necessary  for 
the  inspection  of  the  test  specimens  for  the 
reliability  test  program.  The  inspection 
procedures  will  conform  to  the  test  procedures 
used  for  production  components,  modified 
only  as  necessary  to  accommodate  the  test 
specimen  configuration.  A  log  will  be  kept  of 
the  inspections,  showing  the  order  in  which 
the  inspections  were  performed,  the  inspector 
who  performed  the  inspection,  the  date  and 
time  the  inspection  was  performed,  the  serial 
number  and  the  specification  identification. 

The  inspector  will  prepare  a  report  (or  collect 
required  data  from  automated  reporting 
systems)  on  each  inspection  performed.  The 
reports  will  be  delivered  to  the  test  monitor 
and  will  contain,  as  a  minimum,  the  inspector 
identification  (possibly  coded),  specimen 
identifications  including  any  serial  numbers, 
inspection  date  and  time,  and  the  results  of 
the  inspections  including  the  NDE  responses 
and  locations  of  any  indicated  defects.  The 
data  collection  must  be  compatible  with  the 
reporting  requirements  of  Section  4.5. 

In  the  event  there  is  a  failure  in  one  or  more  of 
the  systems  during  the  performance  of  the 
demonstration  test  program,  the  demonstrator 
will  remedy  the  cause  of  the  failure.  The 
periodic  evaluation  (cf:  paragraph  4.3.4)  for 
assuring  that  the  process  is  under  control  will 
be  performed  to  assure  that  no  problems  have 


arisen  due  to  the  failure.  The  particular  matrix 
element  being  evaluated  at  the  time  of  the 
failure  will  be  completely  reevaluated. 

With  the  agreement  of  the  contracting  agency, 
preliminary  tests  of  the  system  may  be  carried 
out  at  the  contractor's  facility.  T ests  at  the 
contractor's  facility,  however,  should  be 
directed  toward  preliminary  acceptance  and 
the  results  should  not  be  used  to  modify 
hit/miss  decision  criteria. 

4.5  Data  Analysis 

The  purpose  of  the  NDE  demonstration  is  to 
produce  quantitative  descriptions  of  inspection 
system  performance,  POD(a)  curves,  and 
statistics  for  comparing  NDE  systems  based 
on  these  curves  and  statistics. 

Inspections  can  be  grouped  into  two 
categories:  those  for  which  only  the  inspection 
outcome  is  known,  hit  or  miss,  and  those 
providing  additional  information  as  to  apparent 
flaw  size,  £  vs.  a. 

The  analysis  of  these  data  to  produce  POD(a) 
curves  is  to  be  accomplished  using  a  standard 
IBM  PC  computer  program  which  can  be 
supplied  by  the  USAF.  The  latest  version  of 
the  program  and  user's  manual  can  be 
obtained  from  ASC/ENFSA,  Wright-Patterson 
AFB,  OH  45433. 

4.5.1  Missing  Data 

It  is  important  that  all  of  the  inspections  called 
for  by  the  test  matrix  be  performed.  If  the 
design  of  the  experiment  is  a  factorial  (all 
possible  combinations  of  the  factors  being 
varied)  and  some  of  the  inspections  are  not 
performed,  the  POD  analysis  program  cannot 
be  directly  used.  The  assistance  of  a 
professional  statistician  is  recommended  to 
assist  in  the  evaluation  of  such  data.  If  the 
experiment  is  designed  to  evaluate  only  the 
variability  associated  with  different  flaws  and 
one  other  factor,  the  POD  analysis  program 
will  provide  valid  answers  even  if  some  of  the 
inspections  are  not  performed. 

Note  that  the  program  distinguishes  between 
a  missing  inspection  (i.e.,  no  inspection  result 
was  obtained)  and  a  missed  flaw  (i.e.,  the 
inspection  was  performed  but  the  flaw  was  not 
detected) .  See  the  users  manual  for  details. 

A  description  of  the  statistical  methods 
employed  to  generate  these  curves  for  both 
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types  of  NDE  data,  the  procedures  for 
estimating  their  confidence  limits,  and  analysis 
techniques  for  comparing  POD  curves  are 
provided  in  Appendices  A-D. 

The  design  of  the  NDE  demonstration 
(Section  4.2  and  Appendix  A)  provides  the 
foundation  for  the  entire  system  evaluation. 

No  amount  of  clever  analysis  can  overcome  a 
poorly  designed  experiment. 

4.6  Presentation  of  Results 

The  demonstrator  should  submit  a  permanent 
record  of  data  and  a  summary  test  report  for 
each  NDE  reliability  experiment.  To  facilitate 
potential  inclusion  into  a  database,  the  data 
will  be  partitioned  into  four  areas: 

1 .  The  description  of  the  NDE  system. 

2.  The  experimental  design. 

3.  The  individual  test  results. 

4.  The  summary  test  results. 

Each  experiment  will  be  assigned  a  unique 
identification.  The  identification  will  comprise 
codes  which  identify  the  NDE  method,  the 
NDE  system,  the  inspecting  organization,  the 
type  of  specimens,  and  an  experiment 
number.  The  identification  numbers  should 
be  assigned  by  the  contracting  agency. 

The  experiment  identification  code  is  the  tie 
between  the  four  data  types.  Data  included  in 
one  of  the  categories  need  not  be  repeated  in 
another  but,  for  ease  of  access,  general 
information  will  be  repeated  on  the  various 
submittal  forms.  The  data  to  be  submitted  for 
the  permanent  record  will  be  from  all  four 
categories  and  will  comprise  data  sheets, 
tables,  and  plots  as  described  below. 

4.6.1  Category  I  -  NDE  System 

The  System  Configuration  Control  Document 
must  be  sufficiently  detailed  to  account  for  all 
factors  which  have  a  major  influence  on  the 
accept/reject  decision.  The  purpose  in 
recording  this  information  is  to  specifically 
identify  the  system  that  was  evaluated.  If  the 
results  are  to  be  extrapolated  to  different,  but 
similar,  systems,  it  should  be  possible  to 
identify  and  evaluate  the  sources  of  potential 
differences  between  the  systems.  The 
minimum  information  required  in  the 
description  of  each  NDE  method  is  listed  in 


the  data  sheets  in  the  specific  requirements  of 
Section  5. 

4.6.2  Category  n  -  Experimental 
Design 

The  experimental  design  identifies  the 
specimen  set  to  be  used  in  the  demonstration: 
the  test  matrix  of  the  levels  of  the  factors  of 
the  controlled  variables  and  the  number  of 
replications  of  test  conditions;  and  the  order  in 
which  the  steps  of  the  test  matrix  are  to  be 
run.  Note  that  the  specimen  set  determines 
the  number  of  flaws  in  the  experiment  while 
the  number  and  levels  of  the  controlled  factors 
determine  the  number  of  inspections  of  each 
flaw.  All  specimens  would  be  subjected  to 
the  inspections  that  are  specified  by  the 
combinations  of  the  levels  of  the  controlled 
factors  of  the  Demonstration  Design 
Document. 

Sample  data  report  sheets  are  included  in 
Appendix  E,  and  discussed  as  an  example 
here.  Assume  that  the  assessment  of  an 
eddy  current  system  was  to  include  the  effects 
of  two  operators,  two  probes,  and  two 
replications.  An  example  data  sheet  for 
reporting  this  data  is  presented  in  the  list  of 
the  test  combinations  of  Figure  E-1 .  The  same 
information  is  contained  in  the  table  of  test 
conditions  of  Figure  E-2.  This  latter  format  is 
unwieldy  if  the  experiment  contains  many 
(  more  than  four )  factors  or  many  ( more  than 
three  )  levels  of  the  factors.  Flowever,  the 
table  format  more  clearly  shows  the  levels  of 
all  of  the  factors  being  evaluated  and  could 
assist  in  the  analysis  of  the  data. 

A  unique  test  identification  is  assigned  to  each 
combination  of  levels  of  the  factors  ( each  line 
of  the  test  matrix )  to  facilitate  reporting 
individual  test  results.  The  test  identification  in 
the  examples  correlate  exactly  with  the  levels 
of  the  experimental  factors.  This  degree  of 
identification  refinement  is  not  necessary  but  if 
consistently  used  aides  in  the  interpretation  of 
data  from  different  experiments. 

4.6.3  Category  in  -  Individual  Test 
Results 

The  data  collected  during  the  actual 
inspections  are  not  necessarily  the  data  to  be 
recorded  in  the  permanent  individual  test 
result  of  the  experiment.  However,  the 
original  data  must  be  preserved  by  the 
organization  conducting  the  experiment  to 
resolve  problems  which  may  arise.  In  general, 
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inspection  result  data  sheets  will  be  obtained 
from  the  original  data  recordings  and  will 
summarize  the  findings  of  all  inspections  of 
each  flaw.  Figure  E-3  is  the  data  sheet  for  the 
permanent  record  of  the  individual  test  results 
of  an  inspection  experiment.  Figure  E-3  also 
arranges  the  data  in  a  convenient  format  for 
input  to  the  analysis  programs.  A  magnetic 
disk  containing  the  inspection  result  input  files 
in  IBM  P/C  compatible  format  should  be 
submitted  with  the  summary  of  experimental 
results. 

4.6.4  Category  IV  -  Summary  Results 

Summary  results  are  obtained  from  the 
analysis  of  the  individual  test  results  for  a 
particular  experiment.  These  may  include 
POD(a)  function  parameters,  plots  of  POD(a) 
functions,  plots  of  log  4  versus  log  a, 
verification  of  assumptions  of  the  analysis, 
and  an  analysis  of  the  significance  of  test 
variables  ( if  called  for  by  the  objectives  of  the 
experiment )  as  specified  by  the  contracting 
agency.  All  of  this  information  will  become 
part  of  the  permanent  record  of  each  NDE 
experiment. 

The  PC  software  analysis  program  will 
automatically  output  the  required  summary 
statistics  for  a  given  analysis.  When 
requested,  the  program  will  also  generate  files 
for  plotting  POD(a)  vs.  a,  the  lower 
confidence  bound  on  POD(a)  versus  a,  the 
observed  detection  probabilities  for  each  flaw 
vs.  a,  and  log  4  vs.  log  a.  Figures  E-4 
and  E-5  are  examples  of  summary  output 
from  &  vs.  a  and  hit  /miss  analyses, 
respectively.  In  both  of  these  examples,  the 
analysis  provided  complete  sets  of  parameter 
estimates.  If  the  likelihood  equations  cannot 
be  maximized  for  a  particular  data  set,  the 
program  so  indicates.  In  either  type  of 
analysis,  if  the  probability  of  detection  is  not 
significantly  related  to  flaw  size,  the  lower 
confidence  bound  on  the  POD(a)  function  will 
not  be  monotonically  increasing.  In  this  case, 
the  program  does  not  output  an  estimate  of  a 
lower  confidence  bound  on  POD(a)  and 
writes  a  message  that  the  model  does  not 
adequately  fit  the  data .  T ests  of  the 
assumptions  of  the  analysis  should  be  made 
on  the  basis  of  the  log  &  vs.  log  a  data  (for  4 
vs.  a  data)  and  from  the  superposition  of  the 
POD(a)  function  on  the  observed  detection 
probabilities  (for  hit/miss  data).  Other  analysis 
procedures  are  discussed  in  Appendices  C 
and  0.  All  departures  and  potential 
discrepancies  from  the  standard  analysis 


should  be  specifically  identified  and  reported. 

Figures  E-6  and  E-7  are  the  POD(a) 
functions  and  95  percent  confidence  limits  for 
the  example  analyses  of  Figures  E-4  and  E-5, 
respectively.  These  figures  indicate  the 
information  that  must  be  included  on  all  plots 
of  POD(a)  functions  when  used  to  illustrate 
the  capability  of  an  inspection  system  for  each 
of  the  basic  types  of  inspection  data.  Figure 
E-8  presents  the  log  a  vs.  log  a  data  for  the 
analysis  of  Figure  E-4.  These  plots  must  be 
generated  for  all  sets  of  a  vs.  a  data.  Any 
deviations  from  assumptions  (e.g., restricting 
the  set  of  test  flaws  to  a  range  of  linear  log  a 
vs.  log  a )  must  be  corrected  prior  to  analysis 
or  specifically  noted  on  all  characterizations  of 
the  capability  of  the  system.  In  the  hit  /  miss 
type  of  data,  the  estimated  POD(a)  function 
should  be  compared  to  the  detection 
probabilities  for  each  flaw  in  the  specimen  set 
as  in  Figure  E-9. 

4.6.5  Summary  Report 

The  results  of  each  capability  experiment  will 
be  documented  in  a  summary  report  as 
specified  by  the  contracting  agency.  This 
report  will  interpret  the  results  of  the 
experiment  and  conclude  whether  or  not  the 
system  met  specifications.  If  the  system  failed 
to  meet  the  specification,  the  cause  and 
reason  for  the  failure  will  be  identified.  Future 
actions  regarding  qualification  of  the  system 
will  be  presented.  As  a  minimum,  this  report 
will  contain  the  following  information: 

1.  The  NDE  system  description  data 
sheet; 

2 .  A  description  of  the  factors  being 
included  in  the  experimental  design 
and  the  levels  of  each  factor; 

3.  The  output  summary  sheets  from  the 
analysis; 

4.  Plots  of  log  4  vs.  log  a,  if  applicable; 

5.  Plot  of  the  properly  annotated  POD(a) 
function  and  its  lower  95  percent 
confidence  bound, 

6.  Plot  of  the  POD(a)  function 
superimposed  on  the  observed 
detection  probabilities  for  hit/miss 
data; 

7.  A  statement  concerning  the  validity  of 
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the  assumptions  of  the  analyses  linear 
relation  between  log  &  and  log  a  and 
approximately  equal  scatter  of  the 
residuals; 

8.  Identification  of  significance  of  test 
factors  and  interpretation  in  terms  of 
capability  characterization; 

9.  A  statement  of  conclusions  and 
recommendations  for  further  actions. 

More  than  one  experiment  can  be 
documented  in  the  same  report  but  the 
information  from  each  experiment  must  be 
contiguous.  Comparisons  of  data  from 
different  experiments  and  extensive 
summaries  across  comparable  experiments 
are  recommended  whenever  possible. 

4.7  RETESTING 

If  the  system  does  not  meet  the  capability  and 
reliability  requirements  of  the  contract,  the 
demonstrator  must  conduct  a  review  of  the 
possible  causes  for  the  failure.  This  may 
include  some  of  the  multi-factor  statistical 
analysis  described  in  Appendix  A  as  well  as 
function  tests  on  the  various  subsystems.  A 
plan,  which  includes  a  discussion  of  the 
possible  causes  for  the  failure,  must  be 
generated  which  describes  how  the  system 
will  be  modified  and  what  additional  testing  will 
be  performed.  This  new  plan  will  be,  in  effect, 
a  second  Demonstration  Design  Document 
( Section  4.3 ),  except  that  it  will  also  include 
the  discussion  of  the  possible  reasons  for  the 
failure  and  what  will  be  done  about  them. 

4.8  PROCESS  CONTROL  PLAN 

After  the  system  has  been  demonstrated  as 
being  reliable  by  satisfying  the  requirements 
as  specified  by  the  contracting  agency  ,  the 
demonstrator  should  provide  a  written  plan  for 
assuring  that  the  process  is  under  control. 

This  plan  will  include  a  periodic  evaluation  of 
the  processes  involved  including  all 
mechanical,  electrical,  calibration,  and 
computing  systems.  Control  charts  or  other 
proper  permanent  records  will  be  required  as 
an  integral  part  of  the  plan. 

5.0  SPECIFIC  REQUIREMENTS 

The  demonstrator  shall  establish  the  basic 
process  parameters  prior  to  conducting  the 
reliability  demonstration.  Once  the 
demonstration  has  been  completed,  the 


process  parameters  used  in  the  demonstration 
shall  not  be  changed  without  another 
demonstration  program  which  shows  the 
effect  of  changing  the  parameter.  The 
reliability  of  the  system,  the  overall  POD 
curve,  and  the  lower  bound  will  be  determined 
as  a  result  of  some  sort  of  statistical 
experimental  design.  A  factorial  design  is 
preferred.  A  discussion  of  a  factorial  design 
and  the  sampling  approach  is  given  in  the 
appendix. 

5.1  Eddy  Current  Systems 

5.1.1  Demonstration  Design 

5.1 .1.1  Test  Parameters 

The  demonstration  design  for  the  capability 
and  reliability  of  the  eddy  current  system  shall 
include,  but  not  be  limited  to,  the  following  test 
variables.  These  requirements  are  in  addition 
to  those  listed  in  Section  4.3. 

a.  Inspector  Changes 

b.  Sensor  Changes 

c.  Loading  /  Unloading  of  Specimens 

d.  Specimen  Position 

e.  Calibration  Repetition 

f.  Calibration  Standard  Variation,  if 
applicable 

g.  Test  Repetition 


5.11.2  Fixed  Process  Parameters 

Fixed  process  parameters  shall  include,  but 
not  be  limited,  to  the  following.  These 
parameters  will  be  required  to  mirror  actual 
production  inspection.  Some  of  these 
parameters  may  be  included  in  the  matrix  of 
test  variables,  if  desired. 

a.  Drive  frequency 

b.  Coil  frequency  and  design 

c.  Probe  body  and/or  holder  design 

d.  Scanning  technique 

1)  Index  amount 

2)  Scanning  speed 
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e.  Digitization  rate,  if  applicable 

f  Digitization  resolution,  if  applicable 

g.  Threshold  levels 

h.  Filter  values,  low-pass  and  high-pass 

i.  Hardware  and  software  configuration 
control  number 

5.1.2  Specimen  Fabrication  and 
Maintenance 

Specimens  for  the  evaluation  of  eddy  current 
inspection  systems  should  have  surface 
connected  flaws,  generated  as  described  in 
Section  4.3.2.  Following  the  initiation  of  the 
cracks  and  the  grinding  off  of  the  EDM 
notches,  the  specimens  should  be  further 
stress  cycled  to  break  the  crack  through  any 
metal  that  may  have  been  smeared  over  the 
cracks.  At  that  time,  the  crack  lengths  should 
be  measured.  This  is  best  done  by  loading 
the  specimen  to  60%  of  the  load  used  to  grow 
the  cracks,  and  optically  measuring  the  length 
using  a  40  X  magnifier.  To  characterize 
cracks  further,  a  representative  sample  should 
be  dyed  or  heat  tinted  and  the  cracks  broken 
open  to  confirm  the  surface  length 
measurements  and  to  establish  the  crack 
depths  and  shapes. 

Either  crack  area  or  crack  depth,  as  agreed  to 
by  the  contracting  agency,  can  be  used  to 
characterize  the  cracks.  To  make  this  more 
readily  relatable  to  the  detection  requirements 
for  a  given  application,  this  area  can  be 
expressed  in  terms  of  the  radius  of  a  sector  of 
circular  crack  of  that  area.  The  sector  is  a 
quarter  circle  for  corner  cracks,  and  a  half 
circle  for  surface  cracks.  Actual  crack  aspect 
ratio  ( ratio  of  surface  length  to  depth )  is  to 
be  determined  by  breakopen  procedures. 

The  inspectors  should  be  provided  the 
orientation  of  potential  cracks  in  the 
specimens,  but  should  not  know  if  a  particular 
specimen  is  cracked,  or  if  cracked,  the  specific 
location  of  those  cracks. 

The  eddy  current  process  would  not  itself 
degrade  the  specimens'  condition,  so  no 
special  precautions  need  to  be  taken  for 
specimen  maintenance  beyond  those  listed  in 
Section  4.3  2.3.  An  exception  is  the  practice 
of  touching  the  part  with  a  metal  probe  during 
the  part  alignment,  such  as  is  sometimes  used 
with  a  typical  non-contact  bolthole  or  scallop 
inspection.  In  this  case,  the  test  procedures 


must  clearly  prohibit  this  practice,  to  prevent 
damage  to  the  cracked  specimens. 

5.1.3  Testing  Procedures 

5. 1.3.1  Test  Definition 

Procedures  shall  be  written  prior  to  the  test, 
dearly  describing  what  tests  are  to  be 
conducted,  and  the  exact  procedures  for 
conducting  them.  They  should  be  to  the  same 
level  of  detail  as  the  day-to-day  procedures  to 
which  production  inspectors  operate.  In 
addition  to  those  items  outlined  in  5.1.1, 
other  items  to  be  specified  in  this  test 
definition  are  the  following: 

1 .  Part  preprocessing  requirements  as 
appropriate.  This  will  be  more  of  an 
issue  for  the  inspection  of  actual 
production  engine  parts, 
preprocessing  of  the  test  specimens 
should  be  limited  to  cleaning  only. 

2.  System  inspector  requirements. 

This  will  frequently  refer  to 
qualification/training  requirements,  but 
will  also  include  the  number  of 
inspectors  to  be  included  in  the  test 
plan.  At  the  start  of  the  test  matrix 
this  may  typically  call  for  three 
inspectors  to  be  involved  in  the 
system  evaluations.  This  number  is 
specified  by  the  demonstration  design. 

3.  Inspection  materials  are  not  a 
significant  variable  for  eddy  current 
inspections. 

4.  Depending  upon  the  degree  of  system 
automation,  sensors  may  be  the  most 
significant  variable  to  be  considered. 
The  test  plan  should  require  the 
evaluation  of  the  system  using  at  least 
two  samples  of  each  distinct  coil  type 
used  (such  as  end  mount  or  side 
mount  absolute  coils,  differential, 
reflection,  printed  circuit,  etc.).  The 
probe  body  needs  to  be  a  factor  in  this 
evaluation  only  to  the  extent 
necessary  to  allow  inspection  of  the 
specific  specimen  designs. 

5.  Inspection  setup  (calibration)  must  be 
conducted  using  the  same  procedures 
planned  for  use  in  production.  The 
signal  responses  must  be  set  to  the 
same  values,  with  the  same 
tolerances  in  both  situations. 
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6.  The  production  inspection  process 
must  be  duplicated  in  the  tests  as 
much  as  possible.  Thus  the  inspection 
feed  rates,  scan  index  rates,  drive 
signal  frequencies,  filter  settings  and 
any  signal  processing  must  be  the 
same.  Because  the  cracked 
specimens  may  differ  physically  from 
the  real  parts  to  be  inspected  in 
production,  the  scanning  motions  for 
the  specimens  may  necessarily  differ 
from  those  used  for  the  parts.  Efforts 
should  be  made  to  minimize  the 
differences,  and  recognized 
differences  should  be  documented. 

For  automated  systems,  software 
package  version  and  revision  numbers 
must  be  specified. 

7.  Inspection  thresholds  used  in  the  test 
should  be  the  same  as  those  planned 
for  production  use.  Inspection  of  the 

actual  engine  part  specimens  will  help 
to  establish  how  realistic  those 
thresholds  are  for  production 
inspections.  Where  the  specific 
application  of  the  system  is  known, 
typical  production  parts  should  be 
used  to  determine  practical 
thresholds.  It  may  be  desirable  to 
inspect  the  specimens  at  as  low  a 
threshold  as  possible,  to  establish  the 
detection  capabilities  as  a  function  of 
thresholds  used.  This  will  allow  trade¬ 
offs  to  be  made  between  detection 
capability  and  production  throughput. 

5. 1.3.2  Test  Environment 

The  environment  in  which  the  test  is  run 
should  match  the  anticipated  production 
environment  as  closely  as  possible  and 
conducted  at  the  production  site  if  possible.  If 
the  system  is  a  new  development,  the  initial 
tests  may  need  to  be  conducted  at  the 
manufacturer's  facility.  To  the  extent  possible, 
production  conditions  should  be  met.  It  is 
suggested  that  the  manufacturer  conduct  a 
first  evaluation  prior  to  shipping  the  equipment 
and  a  second  test  one  or  two  months  after 
the  system  is  installed  on  site. 

5.1.4  Presentation  of  Results 

Documentation  of  test  results  should  include 
all  raw  data  from  the  tests.  If  some  of  the  data 
is  classed  as  irrelevant  and  not  included  in  the 
data  reduction  process,  this  must  be  noted, 
and  an  explanation  given  for  why  this  decision 


was  made  ( an  indication  was  subsequently 
demonstrated  to  be  due  to  a  power  surge,  or 
to  inadequate  cleaning  of  the  specimen,  for 
example).  This  provides  the  customer  the 
option  of  accepting  or  rejecting  that  rationale. 

Data  for  the  permanent  record  of  eddy  current 
NDE  reliability  experiments  will  be  submitted 
in  accordance  with  the  requirements  stated  in 
Section  4.6  .  Figure  5-1  presents  an  example 
of  the  type  of  information  required  for 
description  of  eddy  current  inspection 
systems.  Eddy  current  data  should  be  in  the 
a  vs.  a  format  and  analyzed  accordingly 
(  see  Appendix  C-2  ). 

5.2  Fluorescent  Penetrant  Testing 
Systems 

5.2.1  Demonstration  Design 

5.2.1 .1  Test  Parameters 

The  demonstration  design  for  the  capability 
and  reliability  of  the  fluorescent  penetrant 
system  shall  include,  but  not  be  limited  to,  the 
following  test  variables.  These  requirements 
are  in  addition  to  those  listed  in  Section  4.3. 

a.  Inspector  Changes 

b.  Sensor  Changes 

c.  Loading/Unloading  of  Specimens 

d.  Specimen  Position 

c.  Calibration  Repetition 

f.  Calibration  Standard  Variation,  if 
applicable 

g.  Test  Repetition 

5.2.1 .2  Fixed  Process  Parameters 

Fixed  process  parameters  shall  include,  but 
not  be  limited  to  the  following.  Some  of  these 
parameters  might  be  included  in  the  matrix  of 
test  variables. 

a.  Penetrating  fluid  formulation 

b.  Penetrating  fluid  application  method 

c.  Dwell  times 

d.  Emulsifier  formulation 
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Date: 
Operator  ID: 


Part  Number  _  Serial  Number  _  Alloy  _ 

Engine  _  Part  Name  _  Surface  Roughness  _ 

•Attach  Specification  Sheet  System  Operating  Ambient  Temperature _ 

State  other  Equipment  Environmental  Constraints  - - 

Test  Frequency  _ _  Scan  Speed  __ _  filtering  - 

Horizontal  Gain _  Vertical  Gain  _  Lift-OfT-Tcchnique 

Coil  Output  Impcdence 


Probe 

Contact  _ _  Noncontact _ 

Differential  _  Absolute  _  Others 

Pancake _ _  Toroid  Coil _  Others 

Coil  Diameter _ _  Shielding  _ _ 


Scanning  Technique _  .  ....  Digitization  - 

Calibration  Level _  Inspection  Threshold - 

Attach  a  sketch  of  the  inspection  setup.  Include  part  orientation  with  respect  to  flaw  orientation  and 
eddy  current  direction. 

Describe  technique  for  analyzing,  rejecting,  and  recording  a  defect  signal. 

Fig.  5- 1  Eddy  current  data  sheet 


6.  Emulsifier/remover  application 
method,  concentration  and  contact 
time 

f.  Developer  formulation 

g.  Developer  application  method 

h.  Drying  time  and  temperature 

i.  Pre-  and  post-rinse  temperature  and 
time 

j.  Hardware  and  software  configuration 
control  number 


5.2.2  Specimen  Fabrication  and 
Maintenance 

The  specimens  for  evaluation  of  PT  systems 
should  contain  Low  Cycle  Fatigue  (LCF) 
surface  connected  cracks.  The  cracks  should 
be  generated  and  measured  as  described  in 
Section  4.3.2.  Because  PT  indications  are 
more  dependent  on  crack  length  than  area, 
these  cracks  should  be  described  by  their 
surface  length. 

The  specimens  should  have  the  cracks 
oriented  and  positioned  randomly  relative  to 
the  edges  of  the  specimens,  to  minimize  the 
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tendency  of  a  manual  inspector  to  "learn  the 
specimens".  The  inspectors  should  not  know 
in  advance  if  a  particular  specimen  is  cracked, 
or  if  it  is,  they  should  not  know  the  location, 
orientation,  or  size  of  the  crack. 

Particularly  for  manual  readers,  it  is  important 
that  a  significant  portion  of  the  samples  be 
crack-free,  to  help  assess  the  false  call  rate 
that  will  be  associated  with  a  particular 
inspection  capability. 

Specimen  maintenance  is  an  issue  for  PT 
specimens,  since  inspection  materials  are 
being  introduced  into  the  cracks  themselves.  It 
is  important  that  the  specimens  be  thoroughly 
cleaned  after  each  inspection.  This  cleaning 
should  use  an  ultrasonic  bath  of  heated 
acetone  to  assure  that  the  penetrants  are 
removed  from  the  cracks. 

Care  must  also  be  taken  to  assure  that  the 
chemicals  in  the  inspection  materials  are  not 
harmful  to  the  specimens.  The  presence  of 
such  elements  as  sulfur  is  potentially  harmful 
to  some  superalloys,  and  must  be  avoided. 

All  inspection  materials  and  cleaning 
procedures  must  be  carefully  documented  as 
a  part  of  the  test  plan. 

5.2.3  Testing  Procedures 

5.2.3.1  Test  Definition 

Procedures  shall  be  written  prior  to  the  test, 
clearly  describing  what  tests  are  to  be 
conducted,  and  the  exact  procedures  for 
conducting  them.  They  should  be  to  the  same 
level  of  detail  as  the  day-to-day  procedures  to 
which  production  inspectors  operate.  In 
addition  to  those  items  outlined  in  5.2.1, 
other  items  to  be  specified  in  this  test 
definition  are  the  following: 

1 .  To  assure  specimen  integrity,  the 
specimens  should  be  subject  only  to 
cleaning  using  chemicals  that  will  not 
degrade  the  specimen  surface  or 
crack  characteristics.  An  ultrasonic 
cleaning  may  be  necessary  to  assure 
that  all  penetrant  material  has  been 
removed  from  the  cracks. 

2.  The  definition  of  the  system  to  be 
evaluated  is  critical  at  this  point,  to 
determine  the  controls  being  applied 
to  the  part  processing.  If  the  system 
being  evaluated  is  a  penetrant 
preprocessor  ( i.e„  applies  the 


penetrant,  perhaps  the  emulsifier  and 
developer )  the  test  is  to  determine 
the  effect  of  that  system  on  the 
inspection  results,  so  the  system 
must  be  considered  to  include  the 
reader.  Similarly,  if  the  test  is  to 
evaluate  new  penetrant  chemicals,  the 
system  definition  must  also  include 
the  reader.  If  the  component  being 
evaluated  is  the  reader  (*eg:  an 
automatic  reader,  as  opposed  to 
manual),  the  system  may  be  defined 
more  restrictively,  and  include  only  the 
reader.  This  assumes  that  it  will  be 
put  in  production  without  any  changes 
to  the  existing  pre-processing 
procedures.  In  this  case,  the 
evaluation  should  be  conducted  with 
no  special  controls  applied  to  the  pre¬ 
processing,  and  with  production 
inspectors  following  their  usual 
procedures.  If  it  is  intended  to  tighten 
control  of  production  pre-processing 
procedures,  it  will  be  necessary  to 
consider  the  system  being  evaluated 
as  including  all  of  the  pre-processing 
activities  as  well  as  the  reader  itself. 

3.  System  inspector  requirements  will 
typically  refer  to  certification  and 
training  requirements,  but  will  also 
include  the  number  of  inspectors  to  be 
included  in  the  test  plans.  Because 
of  the  larger  scatter  historically  seen  in 
PT,  this  is  an  important  criterion.  For 
automated  PT  readers,  it  may  be 
practical  to  reduce  the  number  of 
inspectors  as  detailed  in  Section  4.2. 

4.  Inspection  materials  used  will  be  a 
significant  factor  in  the  evaluation  of 
PT  systems,  and  as  such  must  be 
specified  in  the  test  plan.  In  many 
cases  the  materials  (penetrants, 
emulsifiers,  and  developers)  will  be 
the  subject  of  the  evaluations.  The 
chemicals  used,  their  concentrations, 
and  application  will  need  to  be 
detailed  in  the  test  procedure.  The 
criteria  used  for  the  acceptance  of  the 
chemicals  ( eg.,  concentrations, 
viscosity,  etc. )  must  be  those  that  are 
planned  for  production  use. 

5.  The  sensor  in  PT  inspections  should 
be  considered  to  include  the  light 
source  as  well  as  the  detector.  The 
detector  may  be  the  person  inspecting 
the  specimens,  or  it  may  be  a 
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camera/computer  arrangement.  In 
any  case,  the  sensor  should  be 
typical  of  that  to  be  used  in  production 
inspections,  and  should  meet  all  of  the 
calibration  requirements  specified  for 
that  equipment.  In  the  case  of  the 
human  inspector,  that  calibration  may 
relate  to  his  level  of  NDE  certification; 
for  the  light  source,  it  may  be  intensity 
measured  at  some  specified  distance 
from  the  source;  for  the 
camera/computer  system  it  may  be 
tied  to  a  software  configuration  control 
procedure  and  to  filter  types. 

6.  lnspec*;on  setup/calibration 
requirements  must  be  the  same  as 
those  used  for  production  inspections, 
including  the  same  tolerances  and 
settings  as  may  be  appropriate  for 
automated  readers. 

7.  During  the  evaluation  tests,  the 
production  inspection  process  must  be 
duplicated  as  much  as  possible. 
Settings,  such  as  the  time  of 
penetrant  application,  dwell  time, 
rinse  time,  etc.,  all  should  follow 
production  procedures.  The  methods 
of  application  (dip,  spray,  electrostatic 
spray,  etc.)  also  must  match  that 
planned  for  production.  Scanning 
procedures  also  must  be  described, 
including  parameters  such  as 
distances  of  the  light  source  and  of 
the  detector  from  the  part  or 
specimen.  Particularly  for  the 
automated  readers,  the  software 
version  and  revision  numbers  must  be 
detailed.  Because  the  cracked 
specimens  are  not  the  same  as  real 
hardware  to  be  inspected  in 
production,  the  scanning  motions  for 
specimens  may  not  be  the  same  as 
those  for  real  components.  Efforts 
should  be  made  to  minimize  the 
differences,  and  recognized 
differences  should  be  documented. 
Because  the  specimens  will  not 
provide  the  same  line-of-sight  or 
contour  following  difficulties  as  some 
of  the  actual  production  components  , 
it  is  important  that  the  evaluation  plans 
include  some  real  components  with 
fluorescent  markings. 

8.  Inspection  thresholds  used  in  the  test 
should  be  the  same  as  those  planned 
for  production  use.  With  automated 


readers,  this  may  be  set  in  the  signal 
processing  software,  and  as  long  as 
the  signal  processing  software  is  kept 
constant,  the  thresholds  will  be  the 
same.  For  the  manual  reader,  the 
scanning  procedure  in  the  test  should 
reflect  production  procedures  as 
closely  as  possible  (eg.  if  an 
inspector  would  normally  scan  at  a 
rate  of  10  square  inches  per  second 
without  magnification,  then  during  the 
tests  he  should  not  focus  for 
prolonged  periods  on  a  6  square  inch 
specimen,  or  use  a  magnifier).  If  the 
manual  reader  sees  fluorescent 
indications  that  he  does  not  call  out  as 
cracks  in  the  specimen,  he  should  be 
prepared  to  explain  why  he  did  not  call 
them  out.  This  is  done  to  minimize 
the  effect  of  inspectors  "learning  the 
specimens". 


5.2.3.2  Test  Environment 

The  environment  in  which  the  test  is  run 
should  match  the  anticipated  production 
environment  as  closely  as  possible  and 
conducted  at  the  production  site  if  possible.  If 
the  system  is  a  new  development,  the  initial 
tests  may  need  to  be  conducted  at  the 
manufacturer's  facility.  To  the  extent  possible, 
production  conditions  should  be  met.  it  is 
suggested  that  the  manufacturer  conduct  a 
first  evaluation  prior  to  shipping  the  equipment 
and  a  second  test  one  or  two  months  after 
the  system  is  installed  on  site. 

5.2.4  Presentation  of  Results 

Documentation  of  test  results  should  include 
all  raw  data  from  the  tests.  If  some  of  the  data 
is  classed  as  irrelevant  and  not  included  in  the 
data  reduction  process,  this  must  be  noted, 
and  an  explanation  given  for  why  this  decision 
was  made.  This  provides  the  customer  the 
option  of  accepting  or  rejecting  that  rationale. 

Data  for  the  permanent  record  of  fluorescent 
penetrant  testing  reliability  experiments  will  be 
submitted  in  accordance  with  the  requirements 
stated  in  Section  4.6.  Figure  5-2  presents  an 
example  of  the  type  of  information  required  for 
description  of  penetrant  testing  systems.  The 
PT  inspection  results  are  recorded  in  the 
hit/miss  format  for  manual  inspections,  and 
should  be  in  the  &  vs.  a  format  for  automated 
readers.  The  data  are  analyzed  accordingly 
(see  Appendices  C-2  and  C-3). 


Dale: 
Operator  ID: 


Part  Name 

Part  Number  _ 

Serial  Number 

Alloy 

Engine  _ 

Penetrant  System  Model  _ 

Manufacture  A  Date 

*  Attach  specification  sheet 

Inspection  Setup  -  Describe  procedunng  including: 

a.  Precleaning  method 

b.  Penetrant  manufacturer  &  type.  State  contact  angle. 

c.  Removal  method  •  State  water  conditioning  and  sulphur  and  halogen  content. 

d.  Drying  temperature  and  time 

e.  Developer  application  and  time.  State  manufacturer. 

f.  Inspection  method 

g.  Post-cleaning  method 

Defect  Evaluation  -  State  technique  for  analyzing,  rejecting,  and  recording  a  defect  indication. 

Fig.  5-2  Liquid  penetrant  test  data  sheet 


5.3  Ultrasonic  Testing  Systems  (UT) 

5.3.1  Demonstration  Design 

5.3.1. 1  Test  Parameters 

The  demonstration  design  for  the  capability 
and  reliability  study  of  the  ultrasonic  testing 
system  shall  include,  but  not  be  limited  to,  the 
following  test  variables.  These  requirements 
are  in  addition  to  those  listed  in  Section  4.3. 

a.  Inspector  Changes 

b.  Sensor  Changes 

c.  Loading/unloading  of  specimens 

d.  Calibration  Repetition 

e.  Inspection  Repetition 

5.3.1. 2  Fixed  Process  Parameters 


not  be  limited  to,  the  following.  Some  of  these 
parameters  might  be  included  in  the  matrix  ot 
test  variables. 


a.  Test  frequency  ( instrument  and 
transducer) 

b.  Pulser  settings,  damping,  gain, 
frequency 

c.  Receiver  settings,  gain,  frequency 

d.  Transducer  size  and  type 

e.  Calibration  standards  (  material, 
artificial  defect  size,  metal  travel) 

f.  Water  path 

g.  Digitization  rate  and  resolution,  if 
applicable 


Fixed  process  parameters  should  mirror  actual 
production  inspections  and  shall  include,  but 


h.  TCG  setup 

i.  Gate  parameters 
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j.  Scanning  Technique 

1)  Scanning  speed 

2)  Index  value 

k.  Incident  angle  of  ultrasound 

l.  Threshold  setting 

m.  Wave  mode  (shear,  longitudinal, 
surface,  Lamb,  etc.) 

5.3.2  Specimen  Fabrication  and 
Maintenance 

Ultrasonic  inspection  may  use  one  or  more  of 
several  inspection  modes;  including  surface 
longitudinal,  or  shear  wave.  These  will 
require  different  test  specimens,  the  specifics 
of  which  will  depend  upon  the  inspection 
requirements.  Typically,  the  surface  wave 
inspections  may  use  the  same  specimens  as 
are  used  for  ET  (Section  5.1 .2)  with  LCF 
surface  connected  cracks.  The  size 
characterizations  of  the  specimens  used  for 
ET  may  also  be  used  for  UT  surface  wave. 

The  use  of  surface  wave  UT  assumes  that 
the  orientation  of  the  cracks  is  known,  so  the 
specimens  may  have  the  orientation  of  the 
cracks  defined  (although  the  inspectors  should 
not  know  if  a  particular  specimen  is  cracked, 
or  the  location  or  sizes  of  the  cracks). 

Longitudinal  and  shear  wave  UT  inspections 
would  typically  be  evaluated  using  flat-bottom 
holes  (FBH)  at  various  depths  from  the  entry 
surface  of  the  specimen.  The  capability  is  then 
quoted  in  terms  of  the  detectability  of  the 
various  sizes  of  FBH  at  the  different  depths. 
Since  the  surface  condition  of  the  specimen 
can  significantly  affect  this  detectability,  the 
specimen  surface  condition  should  simulate 
that  of  the  parts  to  be  inspected.  If  this  surface 
condition  is  not  known,  the  specimens  may  be 
made  with  a  very  good  surface  finish,  and 
inspection  of  the  typical  production 
components  may  be  used  to  evaluate  the 
expected  noise.  The  flat  bottom  holes  should 
be  drilled  normal  Jo  the  direction  of  sound 
propagation  for  the  wave  mode  being 
evaluated.  Hole  sizes  may  be  established  by 
replication  of  the  diameter  and  depth.  Since 
material  type  and  processing  history  critically 
affect  the  inspection  capability,  again,  efforts 
should  be  made  to  assure  that  the  material  is 
typical  of  that  anticipated  for  the  production 
components. 

Another  specimen  type  that  can  be  used 


contains  internal  defects  in  diffusion  bonaed 
specimens  as  described  in  Appendix  B.2.3. 
These  defects  can  be  used  to  simulate 
mal-oriented  defects,  such  as  might  arise  from 
internal  crack  growth.  Specimens  should  be 
made  with  the  defects  widely  spaced,  to  avoid 
inspecting  'he  entire  specimen  in  an  artificially 
severe  evaluation  mode.  Placement  of  the 
defects  near  geometric  discontinuities  should 
be  done  only  if  that  is  specifically  what  is  being 
evaluated.  Care  should  be  taken  that  the 
defects  are  not  so  close  together  that  their  UT 
signals  interact.  Flaws  at  greater  depths 
require  greater  separation  than  those  closer  to 
the  surface.  The  proximity  of  the  defects  that 
is  allowed  is  a  function  of  the  depth  of  the 
defect  from  the  entry  surface,  as  the  deeper 
the  defect,  the  greater  the  sound  beam  will 
spread  before  it  reaches  the  defect. 

Specimen  maintenance  should  require  no 
specific  precautions,  with  the  only  exception 
being  the  need  to  assure  that  the  couplant  will 
not  degrade  the  specimen  material. 

5.3.3  Testing  Procedure 

5.3.3.1  Test  Definition 

Procedures  shall  be  written  prior  to  the  test, 
clearly  describing  what  tests  are  to  be 
conducted,  and  the  exact  procedures  for 
conducting  them.  They  should  be  to  the 
same  level  of  detail  as  the  day-to-day 
procedures  to  wh'ch  production  inspectors 
operate.  In  addition  to  those  items  outlined 
in  5.3.1 ,  other  items  to  be  specified  in  this 
test  definition  are  the  following: 

1 .  Part  pre-processing  requirements 
should  be  limited  '.  d  cleaning  the 
specimens,  and  to  the  application  of 
the  couplant  as  appropriate. 

2.  System  inspector  requirements  will 
frequently  refer  to  qualification  and 
training  requirements,  but  will  also 
include  the  number  of  inspectors  to  be 
included  in  the  test  plan.  At  the  start  of 
the  test  matrix,  this  may  typically  call 
for  three  inspectors  to  be  involved  in 
the  system  evaluations.  This  number 
may  be  reduced  ( see  Section  4.2  ). 

3.  Inspection  materials  (eg:  couplant)  are 
not  significant  variables. 

4  The  test  plan  should  require  the 

evaluation  of  the  system  using  at  least 
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two  samples  of  each  distinct 
transducer  planned  for  production  use 
(including  factors  such  as  focal  length 
and  frequency).  The  probe  body,  and 
the  use  of  such  things  as  reflectors, 
need  to  be  factors  in  this  evaluation 
only  to  the  extent  necessary  to  allow 
inspection  of  the  specific  specimen 
designs. 

5.  Inspection  setup/calibration  must  be 
conducted  using  the  same  procedures 
and  calibration  standards  planned  for 
use  in  production.  The  signal 
responses  must  be  set  to  the  same 
values,  with  the  same  tolerances  in 
both  situations.  The  production 
inspection  process  must  be  duplicated 
in  the  test  as  much  as  possible.  Thus 
the  inspection  feed  rates,  scan  index 
rates,  drive  signal  frequencies,  filter 
settings,  water  path  distances,  and 
any  signal  processing  must  be  the 
same.  Because  the  specimens  are 
not  the  same  as  real  components  to 
be  inspected  in  production,  the 
scanning  motions  for  the  specimens 
may  not  be  the  same  as  those  used 
for  components.  Efforts  should  be 
made  to  minimize  the  differences,  and 
recognized  differences  should  be 
documented. 

6.  Inspection  thresholds  used  in  the  test 
should  be  the  same  as  those  planned 
for  production  use.  Inspection  of  the 
actual  fatigue  cracked  hardware 
described  in  Section  4.3. 2.4  will  help 
to  establish  how  realistic  those 
thresholds  are  for  production 
inspections.  Where  the  specific 
application  of  the  system  is  known, 
typical  production  components  should 
be  used  to  determine  practical 
thresholds.  It  may  be  desirable  to 
inspect  the  specimens  at  as  low  a 
threshold  as  possible,  to  establish  the 
detection  capabilities  as  a  function  of 
the  thresholds  used.  This  will  allow 
trade-offs  to  be  made  between 
detection  capability  and  production 
throughput. 

5.3. 3.2  Test  Environment 

The  environment  in  which  the  test  is  run 

should  match  the  anticipated  production 

environment  as  closely  as  possible  and 

conducted  at  the  production  site  if  possible.  If 


the  system  is  a  new  development,  the  initial 
tests  may  need  to  be  conducted  at  the 
manufacturer's  facility.  To  the  extent  possible, 
production  conditions  should  be  met.  It  is 
suggested  that  the  manufacturer  conduct  a 
first  evaluation  prior  to  shipping  the 
equipment  and  a  second  test  one  or  two 
months  after  the  system  is  installed  on  site. 

5.3.4  Presentation  of  Results 

Documentation  of  test  results  should  include 
all  raw  data  from  the  tests.  If  some  of  the  data 
is  classed  as  irrelevant  and  not  included  in  the 
data  reduction  process,  this  must  be  noted, 
and  an  explanation  given  for  why  this  decision 
was  made.  This  provides  the  customer  the 
option  of  accepting  or  rejecting  that  rationale. 

Data  for  the  permanent  record  of  ultrasonic 
testing  reliability  experiments  will  be  submitted 
in  accordance  with  the  requirements  stated  in 
Section  4.6.  Figure  5.3  presents  an  example 
of  the  type  of  information  required  for 
description  of  ultrasonic  testing  systems.  The 
UT  inspection  results  should  be  recorded  in 
the  a  vs.  a  format  whenever  possible. 
However,  when  the  inspection  mode  does  not 
quantify  the  flaw  area  (e.g..  shear  wave 
detecting  a  comer  of  a  crack)  then  the  hit/miss 
format  is  necessary.  The  data  are  analyzed 
accordingly  (see  Appendices  C-2  and  C-3). 

5.4  Magnetic  Particle  Testing 

5.4.1  Demonstration  Design 

5.4.1. 1  Test  Parameters 

The  demonstration  design  for  the  capability 
and  reliability  study  of  the  magnetic  particle 
inspection  system  shall  include,  but  not  be 
limited  to,  the  following  test  variables.  These 
requirements  are  in  addition  to  those  listed  in 
Section  4.2. 

a.  Inspector  Changes 

b.  Sensor  Changes 

c.  loading/unloading  iens 

d.  Calibration  Repetition 

e.  Inspection  Repetition 

5.4.1. 2  Fixed  Process  Parameters 

Fixed  process  parameters  shall  include,  but 
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Part  Number 
Engine _ 


Serial  Number 
Part  Name 


Date 
Operator  ID 

Alloy 

Surface  Roughness 


Equipment  Model 


Manufacturer  &  Date 


♦Attach  Specification  Sheet  System  Operating  Ambient  Temperature 

Other  System  Operating  Environmental  Constraints  _ 


Pulser 

Frequency 

Receiver 


Frequency 


Voltage  _ 

Rise  Time 
Gain 


Damping  _ 

Pulse  Width 
Filtering 


Monitor  Gate 
Delay  _ 


Width 


Time  Compensate  Gain 

Attach  Graph  -  Gain  versus  Time 


Transducer 

Manufacturer 

♦Frequency 


Date 


Level 


Shelf  Life 


Piezo  Electric  Disk  Material 


Disk  Diameter 


This  is  the  frequency  of  the  finished  transducer  measure  uith  a  frequency  analyzer. 


Type 

Contact_ 

Couplant 


Immersion 

Unfocused 


Operating  Water  Path 
Mode  of  operation 

Longitudinal  _ 

Scanning  Technique  _ 

Calibration  Level 


Focus 


Angled 


Couplant 


Wedge  Material 


Focus  Distance 


Transverse 


Surface 


Digitization 


Inspection  Threshold 


Attach  a  sketch  of  the  inspection  setup.  Include  part  orientation  with  respect  to  flaw 
orientation  and  ultrasonic  beam  direction. 


Fig.  5-3  Ultrasonic  test  data  sheet 


22 


not  be  limited  to,  the  following.  Some  of  these 
parameters  may  be  included  in  the  matrix  of 
test  variables,  if  desired. 

a.  Magnetic  suspension  formulation  and 
concentration 

b.  Magnetic  current  for  a  particular  part 
number 

c.  Demagnetizing  procedure 

d.  Method  of  magnetization  (circular  or 
longitudinal) 

e.  Method  (eg:  fluorescent  or  visible) 

5.4.2  Specimen  Fabrication  and 
Maintenance 

The  specimens  for  evaluation  of  MT  systems 
should  contain  LCF  surface  connected  cracks. 
The  cracks  should  be  generated  and 
measured  as  described  in  Section  4.3.2. 
Specimen  geometry  and  material  should 
represent  production  component. 

It  is  important  that  the  specimens  be  treated 
carefully  to  prevent  corrosion.  They  should 
be  thoroughly  cleaned  after  each  use.  Care 
must  be  taken  to  assure  that  the  chemicals  in 
the  inspection  materials  do  not  degrade  the 
specimen  material.  The  presence  of  some 
elements,  such  as  sulfur,  may  be  harmful  to 
some  alloys,  and  must  be  avoided.  All 
inspection  materials  and  cleaning  procedures 
must  be  carefully  documented  as  a  part  of  the 
test  plan. 

5.4.3  Testing  Procedures 

5.4.3. 1  Test  Definition 

Procedures  shall  be  written  prior  to  the  test, 
dearly  describing  what  tests  are  to  be 
conducted,  and  the  exact  procedures  for 
conducting  them.  They  should  be  to  the 
same  level  of  detail  as  the  day-to-day 
procedures  to  which  production  inspectors 
operate.  In  addition  to  those  items  outlined  in 

5.4.1  ,  other  items  to  be  specified  in  this  test 
definition  are  the  following: 

1 .  To  maintain  specimen  integrity,  the 
specimens  should  be  subject  only  to 
cleaning  using  chemicals  that  will  not 
degrade  the  specimen  surface  or 
crack  characteristics. 


2.  The  definition  of  the  system  to  be 
evaluated  is  critical  to  a  determination 
of  the  controls  to  be  applied  to  the 
part  processing.  If  the  system  being 
evaluated  is  a  preprocessor  (i.e. 
applies  the  current  and  the  particle 
material  to  the  component)  the  test  is 
to  determine  the  effect  of  that  system 
on  the  inspection  results,  so  the 
system  must  be  considered  to  include 
the  reader.  Similarly,  if  the  test  is  to 
evaluate  new  particle  materials,  the 
system  definition  must  include  the 
reader.  If  the  component  being 
evaluated  is  the  reader  (eg:  an 
automated  reader,  as  opposed  to 
manual),  the  system  definition  may  be 
defined  more  restrictively,  and  include 
only  the  reader.  This  assumes  that  it 
will  be  put  into  production  without  any 
changes  to  the  existing  preprocessing 
procedures.  In  this  case,  the 
evaluation  should  be  conducted  with 
no  special  controls  applied  to  the 
pre-processing,  and  with  production 
inspectors  following  their  usual 
procedures.  If  it  is  intended  to  tighten 
control  of  production  pre-processing 
procedures,  it  will  be  necessary  to 
consider  the  system  being  evaluated 
as  including  all  of  the  pre-processing 
activities  as  well  as  the  reader  itself. 

3.  Inspector  requirements  refer  to 
certification  and  requirements,  and 
also  will  include  the  number  of 
inspectors  to  be  included  in  the  test 
plans.  Because  of  the  scatter 
historically  associated  with  what  has 
historically  been  a  very  operator- 
dependent  inspection,  this  is  an 
important  criterion.  For  automated 
readers,  it  may  be  practical  to  reduce 
the  number  of  inspectors  as  detailed 
in  paragraph  4.2. 

4.  Inspection  materials  used  will  be  a 
significant  factor  in  the  evaluation  of 
MT  systems,  and  as  such  must  be 
specified  in  the  test  plan.  In  many 
cases  the  materials  themselves  will  be 
the  subject  of  the  evaluations.  The 
chemicals  used,  their  concentrations, 
agitation,  and  their  application  will 
need  to  be  detailed  in  the  test 
procedure.  The  criteria  used  for  the 
acceptance  of  these  materials  must 
be  those  that  are  planned  for 
production  use. 
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5.  The  sensor  in  MT  inspections  should 
be  considered  to  include  the  light 
source  as  well  as  the  detector.  The 
detector  may  be  the  person  inspecting 
the  specimens,  or  it  may  be  a 
camera/computer  arrangement.  In 
any  case,  the  sensor  should  be 
typical  of  that  to  be  used  in  production 
inspections,  and  should  meet  all  of  the 
calibration  requirements  specified  for 
that  equipment.  In  the  case  of  the 
human  inspector,  that  calibration  may 
relate  to  his/her  level  of  certification; 
for  the  light  source,  it  may  be  intensity 
measured  at  some  specified  distance 
from  the  source;  for  the  camera/ 
computer  system  it  may  be  tied  into  a 
software  configuration  control 
procedure  and  to  filter  types. 

6.  Inspection  setup/calibration 
requirements  must  be  the  same  as 
those  used  for  production  inspections, 
including  the  same  tolerances  and 
settings  as  may  be  appropriate  for 
automated  readers. 

7.  During  the  evaluation  test,  the 
production  inspection  process  must  be 
duplicated  as  much  as  possible. 
Settings  such  as  the  current,  direction 
of  current  flow,  particle  application  and 
agitations,  etc.,  all  should  follow 
production  procedures.  The  methods 
of  application  also  must  match  that 
planned  for  production.  Scanning 
procedures  also  must  be  described, 
including  parameters  such  as  distance 
of  the  light  source  and  of  the  detector 
from  the  part/specimen.  Particularly 
for  automated  readers,  the  software 
version  and  revision  numbers  must  be 
detailed.  Because  the  cracked 
specimens  are  not  the  same  as  real 
components  to  be  inspected  in 
production,  the  scanning  motions  for 
the  specimens  may  not  be  the  same 
as  those  used  for  the  components. 
Efforts  should  be  made  to  minimize 
the  differences,  and  recognized 
differences  should  be  documented. 
Because  the  specimens  will  not 
provide  the  same  line-of-sight  or 
contour-following  difficulties  as  will 
some  of  the  actual  production 
components,  it  is  important  that  the 
evaluation  plans  include  some  real 
production  components  with  artificial 
defects  such  as  EDM  notches. 


8.  Inspection  thresholds  used  in  the  test 
should  be  the  same  as  those  planned 
for  production  use.  With  automated 
readers,  this  may  be  set  in  the  signal 
processing  software,  and  as  long  as 
the  signal  processing  software  is  kept 
constant,  the  thresholds  will  be  the 
same.  For  the  manual  reader,  the 
scanning  procedure  in  the  test  should 
reflect  production  procedures  as 
closely  as  possible  (eg.  if  an  inspector 
would  normally  scan  at  a  rate  of  1  o 
square  inches  per  second  without 
magnification,  then  during  the  tests  he 
should  not  focus  for  prolonged  periods 
on  a  6  square  inch  specimen,  or  use 
a  magnifier).  If  the  manual  reader 
sees  fluorescent  indications  that  he 
does  not  call  out  as  cracks  in  the 
specimen,  he  should  be  prepared  to 
explain  why  he  did  not  call  them  out. 
This  will  be  done  to  minimize  the 
effect  of  inspectors  "learning  the 
specimens". 

5.4.3.2  Test  Environment 

The  environment  in  which  the  test  is  run 
should  match  the  anticipated  production 
environment  as  closely  as  possible  and 
conducted  at  the  production  site  if  possible.  If 
the  system  is  a  new  development,  the  initial 
tests  may  need  to  be  conducted  at  the 
manufacturer’s  facility.  To  the  extent  possible, 
production  conditions  should  be  met.  It  is 
suggested  tha*  the  manufacturer  conduct  a 
first  evaluation  prior  to  shipping  the  equipment 
and  a  second  test  one  or  two  months  after  the 
system  is  installed  on  site. 

5.4.4  Presentation  of  Results 

Documentation  of  test  results  should  include 
all  raw  data  from  the  tests.  If  some  of  the  data 
is  classed  as  irrelevant  and  not  included  in  the 
data  reduction  process,  this  must  be  noted, 
and  an  explanation  given  for  why  this  decision 
was  made.  This  provides  the  customer  the 
option  of  accepting  or  rejecting  that  rationale. 
The  MT  inspection  results  are  recorded  in  the 
hit/miss  format  for  manual  inspections,  and 
should  be  in  the  £  vs.  a  format  for  automated 
readers.  The  data  are  analyzed  accordingly 
(see  Appendices  C-2andC-3). 

6.0  NOTES 

6.1  INTENDED  USE 

The  intended  use  of  this  document  is  to 
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specify  procedures  for  assessing  NDE 
inspection  capability  that  will  permit 
quantitative  comparison  of  one  system  with 
another  with  respect  to  known  specimen 
standards. 

6.2  DATA  REQUIREMENTS 

The  Data  descriptions  associated  with  the 
requirements  of  this  document  should  be 
found  in  the  requirements  of  each  individual 
contract. 

6.3  RESPONSIBLE  ENGINEERING 
OFFICE 

The  office  responsible  for  the  development 
and  maintenance  of  this  information  and  the 
USAF  MIL-STD  which  the  data  is  derived  from 
is  ASC  /  ENFSA,  Wright-Patterson  Air  Force 
Base,  OH  45433  ;  AUTOVON  785-3331, 
Commercial  (513)255-3331. 

6.4  TRADE-OFFS  BETWEEN  IDEAL 
AND  PRACTICAL  DEMONSTRATIONS 

Ideally,  the  test  designed  according  to  this 
document  should  include  all  variables  of 
concern  in  the  test  matrix.  The  conditions 
found  in  real  part  inspections  should  be 
matched  exactly.  In  reality,  these  constraints 
cannot  always  be  made.  For  example,  the 
number  of  different  geometries  in  a  complete 
engine,  and  the  requirement  that  each  be 
tested  as  suggested  by  the  ideal  test  design, 
may  drive  testing  costs  and  times  to  the  point 
where  it  is  impractical  to  do  such  a  test.  This 
same  situation  could  involve  test  parameters, 
probes,  and  mechanical  parameters.  The 
number  of  parameters  that  could  possibly  be 
tested  is  immense.  The  solution  to  this 
problem  is  to  allow  the  terms  reasonable  and 
representative  to  govern  any  concessions 
made  to  reality.  The  term  reasonable  argues 
for  a  balanced  definition  of  the  test,  one  which 
does  not  force  the  ideal  too  much.  Important 
variables  should  be  tested,  while  unimportant 
variables  may  not  have  to  be  tested.  It  implies 
avoidance  of  extremes  in  testing,  and 
application  of  logical  considerations  in 
compromise.  The  term  representative  also 
argues  for  limiting  the  number  of  variables 
tested,  but  in  a  manner  which  gives 
reasonable  representation  of  the  real 
inspections.  This  philosophy  of  testing 
recognizes  that  not  all  variables  will  be  tested, 
and  accepts  that  some  areas  of  Inspection  will 
be  better  than  the  test  and  some  will  be 
worse.  By  being  reasonable  and 


representative,  a  good  quality  test  can  be 
designed  which  will  satisfy  cost  and  time 
constraints.  As  mentioned  elsewhere,  the 
final  test  design  must  be  submitted  to  the 
customer  for  approval,  and  becomes  part  of 
the  design  document. 

6.5  OTHER  TOPICS 

The  following  notes  are  included  as  examples 
of  on-going  work  related  to  NDE  system 
evaluation.  The  work  has  not  progressed 
sufficiently  to  include  these  topics  as 
standards,  yet  they  are  important  and  should 
be  considered  as  part  of  any  technical  update 
of  this  document. 

6.5.1  FALSE  CALL  ANALYSIS 

When  an  inspection  stimulus  is  applied  to 
detail,  the  interpretation  of  the  response 
determines  whether  or  not  a  crack  is  judged  to 
be  present.  Presumably,  the  inspection 
system  is  designed  to  produce  a  clear, 
unambiguous  response  to  all  cracks  whose 
sizes  exceed  a  specified  value.  If  noise  (from 
whatever  source)  is  present  in  the  signal 
response,  false  indications  (false  calls)  can 
result  if  a  noise  response  from  a  non-cracked 
detail  is  interpreted  as  being  caused  by  a 
crack.  Although  false  indications  are 
undesirable  for  economic  reasons,  they 
cannot  be  entirely  eliminated  since  there  is  a 
trade  off  between  the  rate  of  false  indications 
and  the  ability  to  detect  very  small  cracks. 

Rates  of  false  indications  are  currently 
quantified  by  a  count  of  the  number  of 
indications  that  are  given  at  locations  for  which 
no  known  crack  is  present.  There  have  been 
data  sets  for  which  the  false  call  rate  was  so 
high  that  very  small  “detected"  cracks  were 
more  likely  to  be  false  indications  at  crack 
sites.  These  data  produced  POD(a)  functions 
that  did  not  adequately  model  the  observed 
results.  To  incorporate  the  simultaneous 
estimation  of  the  parameters  of  the  POD(a) 
function  and  the  false  call  rate,  a  modified 
analysis  is  being  considered.  This  new  model 
is  based  on  the  probability  of  obtaining  an 
indication  (rather  than  detection)  at  an 
inspection  site. 

Let  POD(a)  represent  the  probability  of 
obtaining  an  indication  in  an  inspection  of  a 
crack  of  size  a.  Let  p  represent  the 
probability  of  a  false  indication  for  the 
inspection  which  depends  on  the  inspection 
method,  the  inspector,  the  calibration,  etc. 
Then 
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POI(a)  =  p  +  POD(a)  -  Prob  [  false  call  and 
detection  ] 


(Note  that  an  inspection  response  signal  could 
be  such  that  both  the  response  and  the  noise 
levels  would  be  large  enough  to  produce  a 
crack  indication) .  If  the  probability  of  a 
simultaneous  detection  and  false  indication 
are  independent. 

POI(a)  =  p  +  (1  -  p)  POD  (a) 

While  this  expression  may  be  a  reasonable 
model  for  the  joint  estimation  of  p  and  the 
parameters  of  the  POD(a)  function,  the 
implementation  of  the  model  by  maximum 
likelihood  is  not  straightforward.  Other 
approaches  to  estimating  the  parameters  and 
placing  confidence  limits  on  the  POD(a) 
function  a^e  being  sought.  At  present  a 
maximum  false  call  rate  of  5  %  is  suggested 
to  ensure  proper  POD(a)  representation. 


6.5.2  POD  FROM  MULTIPLE 
INSPECTIONS 

Redundant  inspection  is  the  practice  of 
performing  multiple  inspections  on  a  single 
part.  The  philosophy  behind  multiple 
inspections  is  to  increase  the  probability  of 
detecting  a  flaw  which  may  exist.  If  the  POD 
fails  to  meet  CDRL  requirements,  it  may  be 
possible  to  use  redundant  inspections  to  shift 
the  POD  curve  and  its  lower  bound. 

Historically,  calculations  expressing  the 
benefits  of  redundant  fluorescent  penetrant 
inspection  have  been  made  assuming 
complete  independence  between  inspections. 
For  example,  if  the  probability  of  detecting 
(POD)  a  flaw  of  a  certain  size  is  0.9,  then  the 
probability  of  a  single  miss  (POM)  is  0.1, 
the  probability  of  two  (independent)  misses  is 
0.1  (0.1)  =  0.01,  and  so  the  POD  for  two 
inspections  is  1  -  0.01  =  0.99,  assuming 
independence. 

Unfortunately,  most  inspections  have  been 
found  to  be  not  independent  inspection-to- 
inspection.  Events  which  cause  this 
dependency  include  inspection  of  the  same 
crack  twice  (location,  size,  etc  ),  or  the  same 
inspector  may  investigate  the  crack  twice,  or 
the  surface  of  the  part,  and  the  crack  itself, 
may  not  be  restored  to  its  initial  state  between 
inspections. 


In  reality,  quantifying  the  POD  due  to  multiple 
inspections  requires  knowledge  of  this 
dependency.  For  double  inspections,  the 
calculation  is: 

POD(AorB)  =  POD(  A )  +  POD(B)  - 
POD(  A  and  B  ) 

where  these  POD  equations  are  calculated 
as  described  in  Appendix  C,  Modeling 
Probability  of  Detection,  and  where  A  and  B 
refer  to  two  inspectors. 

Assuming  that  inspector  A  and  inspector  B 
equally  share  the  responsibilities  for  flaw 
location,  the  difference  between  single  and 
double  inspections  assuming  inspection-to- 
inspection  dependency  can  be  expressed  as: 

POD  increase  =  (  POD  for  double  inspection ) 
-  (  POD  for  single  inspection ) 

=  {  POD(  A  )  +  POD(  B  )  -  POD(  A  and  B  ) } 
-  {  0.5  POD(  A  )  +  0.5  POD(  B  ) } 

=  0.5  POD(  A  )  +  0.5  POD(  B  )  -  POD(  A 
and  B ) 

This  argument  can  be  extended  for  multiple 
inspections  greater  than  double  inspections, 
or  for  a  process  parameter  other  than 
inspector,  or  for  a  system  oil. sr  than  PT 
where  redundant  benefits  may  be  needed. 

For  more  details  please  see  “Quantifying  the 
Benefits  of  Redundant  Fluorescent  Penetrant 
Inspection",  Review  of  Progress  in 
Quantitative  Nondestructive  Evaluation,  Vol. 
8B  pp.  2221-2228. 

6.5.3  INSPECTION  OF  EDM-NOTCHED 
PARTS 

System  Probabilities  of  Detection  (PODs) 
established  using  the  procedures  of  this 
Standard  characterize  the  sensitivity  of  the 
system  to  the  flaws  in  the  specimens  tested. 
The  applicability  of  these  PODs  to  the 
inspection  of  actual  engine  hardware  is 
dependent  upon  the  extent  to  which  the 
specimens  mirror  the  actual  part  conditions. 
That  they  are  not  perfect  reflections  is  due  to 
limitations  in  such  factors  as: 

1 .  Full  part  geometry  is  not  reproduced 
(eg:  dovetail  slant,  part  radius 
curvature), 

2.  System  manipulation  routines  are 
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different  (since  not  testing  full  parts) 

3.  Only  typical  geometries  are 
represented,  a  full  set  of  all  features 
inspected  is  prohibitively  expensive 

4.  It  may  be  difficult  to  initiate  defects  in 
the  specimens  that  duplicate  the 
positions,  sizes,  and  shapes  of  flaws 
that  are  the  targets  of  the  part 
inspections. 

To  make  some  estimate  of  how  directly  the 
established  POD  curves  may  be  applied  to  the 
inspections  of  the  parts  it  is  appropriate  to 
inspect  actual  engine  hardware  with  artificial 
flaws  machined  in  the  critical  locations.  Note 
that  the  purpose  of  this  test  is  not  to  modify 
the  PODs  already  generated,  but  to  evaluate 
their  applicability  to  production  inspections. 

The  rest  of  this  discussion  will  use  as  an 
example  eddy  current  inspection  of  EDM 
notched  parts.  The  notches  used  for  these 
tests  may  be  sized  to  provide  an  £  that  can 
be  referenced  to  the  calibration,  or  to  provide 
eddy  current  a  values  approximately  equal  to 
those  of  the  crack  sizes  to  be  detected  in  the 
production  inspections.  The  steps  in 
establishing  the  size  of  this  notch  are  as 
follows: 

1 .  Determine  the  inspection  goal  (eg: 
detection  of  a  0.010"  crack  in  the 
part). 

2.  Determine  from  the  POD  testing  the 
average  &  of  this  size  crack  in  the 
specimen  (eg:  100  counts). 

3.  Machine  several  size  notches  in 
specimen  blanks,  to  determine  the 
size  notch  that  yields  an  &  of  the 
same  100  counts  level  (interpolation 
on  a  log-log  plot  may  be  necessary). 

Notches  may  then  be  machined  into  the  part 
features  to  tie  inspected.  Significant  variations 
of  the  notch  &  values  from  those  expected 
may  indicate  that  the  POD  curves  established 
using  the  specimens  may  not  be  directly 
applicable  to  those  part  features  being 


inspected.  The  causes  of  this,  and  some 
means  of  establishing  representative  PODs 
should  be  examined. 


6.5.4  ILL-BEHAVED  DATA 

Because  of  an  inadequate  number  of 
observations  or  an  inappropriate  range  of  flaw 
sizes,  some  inspection  results  contain  little 
information,  and  taken  by  themselves,  give 
nonsense  POD(a)  curves.  One  possible 
approach  in  this  situation  would  be  to  simply 
declare  the  data  unusable.  This  may  ultimately 
prove  to  be  the  most  prudent  procedure. 
However,  there  is  some  engineering 
information  contained  within  these 
observations.  A  better  idea  might  be  to  extract 
that  information  and  evaluate  it  in  light  of  prior 
knowledge  about  similar  inspection  processes. 
Then  decide  if  more  testing  is  required  to 
augment/replace  the  data  under  consideration. 

Bayesian  statistics  provides  the  framework  for 
this  analysis.  The  overall  plan  is  to  define  the 
likelihood  in  terms  of  the  observed  data  (as  is 
currently  done)  and  in  terms  of  the  expected 
parameters  values,  based  on  prior  experience. 
Parameter  estimates  can  then  be  selected 
such  that  this  new  likelihood  function  achieves 
a  maximum. 

For  this  approach  to  be  effective,  the  influence 
of  the  prior  information  should  be  small,  when 
the  data  are  well  behaved,  and  only  moderate 
otherwise.  If  the  influence  of  the  “prior"  ( as  it 
is  called )  is  too  overwhelming,  what  little 
information  contained  within  the  data  will  be 
obscured  and  the  entire  exercise  will  be  of  no 
practical  value.  The  prior,  therefore,  should 
provide  stability  to  the  data,  without  undue 
influence  on  the  final  outcome. 
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APPENDIX  A 

TEST  PROGRAM  GUIDELINES 


A.1  PURPOSE 

The  purpose  of  an  NDE  demonstration  is  to 
produce  a  POD(a)  curve,  and  lower  bound, 
which  accurately  represent  the  capability  of  an 
inspection  system.  This  is  accomplished  by 
recording  the  system  responses  which  result 
from  inspecting  flaws  of  known  sizes.  The 
mathematical  details  of  producing  a  POD(a) 
curve  are  discussed  in  Appendix  C.  Since 
the  system  response  for  ET,  UT,  PT,  or  MP 
is  subject  to  variation  in  the  input  variables 
(eg:  probe,  inspector,  penetrant  type),  it  may 
be  necessary  to  determine  the  impact  of  these 
variables  on  the  system  response.  The  plan 
for  determining  the  best  estimate  of  the  overall 
POD(a)  curve  as  well  as  the  significance  of 
the  input  variables  is  called  an  NDE 
experimental  design. 

A.2  MAIN  EFFECTS  AND 
INTERACTIONS 

Main  effects  are  the  changes  in  the  NDE 
system  response  caused  by  the  input 
variables  acting  individually.  Main  effects  are 
additive.  An  interaction  occurs  between  two 
variables  if  the  effect  of  the  two  variables  is 
not  additive.  If  there  is  no  interaction,  then  a 
pattern  observed  at  a  low  level  of  a  factor 
should  result  in  the  same  pattern  at  the  high 
level.  Pictorially  this  is  shown  in  Figure  A-1 , 
where  inspector  2  produces  a  higher 
response  than  does  inspector  1,  regardless 
of  which  probe  is  used,  and  probe  1  is  better 
than  probe  2  regardless  of  inspector. 

If  there  is  interaction,  then  this  pattern  doesn't 
exist.  This  is  illustrated  in  Figure  A-2.  Here 
inspector  1  using  probe  2  produces  a  higher 
response,  but  the  situation  is  reversed  when 
the  inspectors  change  probes.  Notice  that 
probe  1  is  not  uniformly  better  than  probe  2. 

If  an  interaction  is  suspected,  then  the 
experiment  should  be  designed  so  that  the 
interaction  effects  can  be  separated  from  the 
main  effects. 


Fig.  A- 1  Parallel  lines  indicate  No.  2  factor  interaction 


Fig.  A-2  Interactions  cause  the  lines  to  cross 


A.3  EXPERIMENTAL  DESIGN 

Input  variables  can  be  divided  into  two  groups: 
control  factors  and  noise  factors.  The  first 
group  contains  variables  which  are  to  be 
tested  at  different  levels.  (For  ET,  significant 
variables  may  be  inspector,  probe,  and 
position;  for  PT,  significant  variables  may 
include  inspector,  penetrant,  or  emulsifier 
processing  times).  The  second  group  contains 
those  variables  which  either  can  be  tested,  but 
for  some  reason  are  deemed  as  less 
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important  to  test,  or  can't  be  identified  and 
therefore  cant  be  tested  (but  can  still  cause 
variation  in  the  system).  Noise  factors  may  be 
changes  in  surface  preparation,  or  influence  of 
laboratory  humidity  and  temperature. 

The  output  response  can  be.  expressed  as: 

y  =  f(x1 . xp-  xp+1-  xp+r>  xp+r+1>  ■••) 

where 

x-), ....  Xp  are  controlled  in  the  test 
Xp+i, ...  are  uncontrolled  noise 

Xp+i . Xp+r  can  be  tested  but  are  not 

xp+r+1 ,  cannot  be  identified  or  tested 

To  quantify  the  POD(a)  relationship  for  an 
eddy  current  system,  a  typical  test  program 
would  proceed  as  follows.  First,  those 
knowledgeable  of  the  specific  inspection 
process  would  decide  which  variables  are 
important  in  defining  the  response.  If  many 
variables  are  identified,  a  Pareto  analysis  may 
help  determine  which  are  the  more  important, 
and  thus  separate  the  significant  few  variables 
from  the  trivial  many  variables.  Once  the 
important  variables  are  determined  (say 
inspector,  probe,  and  position  of  the  specimen 
for  ET),  an  NDE  experiment  is  designed  to 
determine  their  effect  on  the  response.  A 
factorial  experiment,  discussed  in  A.3.2,  is 
recommended  for  most  cases,  although  many 
designs  exist  and  should  be  used  as 
appropriate. 

A.3.1  One-factor-at-a-time  Experiments 

A  one-factor-at-a-time  design,  as  the  name 
implies,  considers  each  factor  in  isolation.  To 
test  for  a  difference  in  probe  under  this  plan, 
two  probes  would  be  selected  and  specimens 
tested  using  these  probes  while  inspector  and 
position  are  held  constant.  In  the  past,  this  has 
been  a  common  method  of  experimentation. 
However,  there  are  more  efficient  ways  to 
gather  the  needed  information  (i.e.  fewer  tests 
are  required  using  other  methods).  There  are 
other  problems  with  the  one-factor-at-a-time 
method.  Because  the  other  variables  are  held 
unchanged,  the  observed  NDE  system 
responses  are  valid  only  for  that  specific 
setting  of  the  other  variables.  Therefore, 
interactive  effects  among  input  variables  are 
undetectable.  It  is  also  more  likely  to  confuse 
a  correlation  of  input  and  response,  with  cause 
and  effect,  using  this  method  of 
experimentation.  Finally,  the  resulting  POD(a) 
curves  are  less  precise  than  they  could 


otherwise  be,  because  only  one  set  of 
measurements  is  taken  to  estimate  the 
influence  of  a  specific  variable. 

A.3.2  Factorial  Experimentation 

A  factorial  NDE  evaluation  considers  the 
influence  of  all  factors  simultaneously.  A  full 
factorial  experiment  is  performed  by  choosing 
a  number  of  levels  for  each  of  a  number 
of  factors  (variables)  and  the  experiment  is 
conducted  for  each  possible  combination  of 
the  factors.  If  there  are  Li  levels  for  the  first 
variable,  L2  for  the  second,  and  Lk  for  the 
kth  variable,  then  the  experiment  is  called  an 
LI  x  L2  x  ...  x  Lk  factorial  design.  A  2x3x5 
factorial  design  requires  2x3x5  =  30  runs. 
As  an  example,  consider  the  3  factors  of  the 
ET  setup  (PRobe,  INspector,  and  position) 
each  at  2  levels;  this  is  a  2x2x2  =  8 
run  factorial  experiment.  Figure  A-3  is  a  plot 
of  the  three  independent  (input)  variables 
for  this  example.  A  (+)  indicates  one  level 
of  either  the  probe  (PR),  inspector  (IN),  or 
position  (POS)  variable  and  a  (-)  indicates  the 
second  level.  Notice  that  the  cube  represents 
the  input  factors  only;  the  system  response  is 
not  being  plotted. 
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Fig.  A-3  A  cube  representing  a  full  (2X2X2)  factorial 
experiment 


The  test  conditions  represented  by  this  cube 
are  provided  in  Table  A-1 .  In  practice,  run 
numbers  are  assigned  to  the  tests  in  a  random 
order.  Randomization  is  required  to  minimize 
the  effects  of  those  factors  which  are  sources 
of  variation  for  the  response  and  have  not 
been  controlled  experimentally,  i.e.  the  noise 
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factors.  Errors  can  result  from  attempts  to 
save  time,  labor,  or  materials  by  choosing  a 
particular  non-random  run  sequence,  so 
careful  thought  and  planning  are  necessary 
prior  to  conducting  the  NDE  system 
evaluation. 

The  number  of  levels  of  a  factor  to  include  in 
an  experiment  is  based  on  several 
considerations.  If  the  NDE  system  response 
is  linear,  then  two  levels  are  sufficient; 
nonlinear  factors  require  three  or  more  levels. 
The  number  of  natural  levels  a  variable 
possesses,  or  the  amount  of  variation  which  is 
expected,  can  also  influence  the  number  of 
levels  to  test.  Experience  suggests  that  2  to 
3  levels  are  appropriate  for  testing  variables 
in  an  ET,  UT,  PT,  or  MT  system.  (Other 
types  of  testing  situations  may  require  more 
than  3  levels  or  more  than  3  variables;  this 
will  be  discussed  shortly.) 

Factorial  designs  have  three  major  benefits; 

1 .  The  design  is  more  efficient,  i.e.  more 
information  is  gained  for  a  given  expenditure 
of  labor,  time,  and  material,  than  with  other 
methods. 

2.  Comparisons  across  levels  of  a  factor 
(eg.  inspector  or  probe)  are  more  precise 
since  average  values  are  used  rather  than 
single  observations.  That  is,  all  observations 
contribute  to  all  comparisons  among  all 
factors;  no  single  test  exists  only  to  evaluate 
a  single  factor.  Notice  in  Table  A-1  that  the 
average  of  test  conditions  1 , 2,  3,  4  compared 
to  the  average  of  test  conditions  5, 6,  7,  8  is  a 
comparison  of  probe  1  results  to  probe  2 
results  -  each  with  a  sample  size  of  4.  A 
comparison  of  1,  2,  5,  6  vs.  3,  4,  7,  8 
can  be  used  to  check  for  a  difference  between 
inspectors.  Specimen  position  effects  are 
estimated  by  comparing  1 , 3,  5, 7  vs  2, 4,  6,  8. 

3.  Interactions  can  be  estimated.  For 
example,  the  average  response  from  tests  1, 
2,  7,  8  vs.  the  average  resulting  from  3,  4, 
5,  6  provides  an  estimate  of  the  magnitude  of 
the  interaction  of  probe  and  inspector. 

A.3.3  Fractional  Factorial 
Experimentation 

The  number  of  tests  required  by  a  full  factorial 
design  increases  rapidly  as  the  number  of 
factors  is  increased.  Even  with  a 

2x2x2x2«24=  Ifirun  factorial 
design,  the  labor,  time,  and  material  used  to 


complete  the  design  may  be  more  than  is 
available.  It  turns  out,  however,  that  since 
the  factorial  design  is  efficient  and  estimates 
of  variables  effects  are  made  more  precisely 
than  one-factor-at-a-time  methods,  the  results 
can  be  achieved  by  performing  only  a  fraction 
of  the  full  factorial.  However,  since  fewer  NDE 
settings  are  evaluated,  something  is  tost.  The 
ability  to  discern  the  significance  of  the  main 
effects  (PR,  IN,  POS)  from  the  effects  of 
some  of  the  interaction  terms  is  traded  for  the 
reduced  test  matrix.  For  example,  in  a  full 
factorial  experiment,  PR  may  be  identified  as 
having  a  significant  effect  on  the  NDE 
response.  In  a  fractional  experiment,  the 
effect  of  PR  may  be  confused  with  the  effect 
of  the  IN'POS  interaction,  and  therefore  the 
significance  may  be  attributed  to  the  probe  by 
itself  or  to  an  interaction  of  probe  and  position. 
If  this  problem  occurs,  further  experimentation 
can  be  performed  to  investigate  these 
interactive  effects  without  having  to  design  a 
completely  new  experiment.  This  is  not  true 
of  the  one-factor-at-a-time  approach. 

The  example  in  Table  A-2  shows  how  the 
effects  which  are  confused,  or  confounded, 
with  one  another  can  be  determined  by 
comparing  the  “signs"  in  each  column; 
columns  with  all  signs  the  same  are  confused. 
Here  the  effects  of  IN  and  the  PR*POS 
interaction  are  confused,  the  effects  of  PR 
and  the  IN*POS  interaction  are  confused,  and 
the  effects  of  POS  and  the  IN*PR  interaction 
are  confused. 

Using  this  information,  a  fractional  factorial 
can  be  designed  by  setting  the  factors  of  PR, 
IN,  and  POS  at  two  levels  each.  This 
situation  can  be  represented  by  the  cube  in 
Figure  A-4. 

Four  tests  under  conditions  1,  4,  6,  7  of  the 
full  factorial  matrix  in  Table  A-1  would  be 
made;  these  points  are  found  in  Table  A-2. 
The  comparison  between  the  probe  levels 
would  be  made  by  comparing  the  average  of 
the  response  from  one  level  of  probe  (PR+) 
to  the  average  response  with  the  other  level  of 
probe  (PR-).  Notice  that  this  same  (fractional) 
data  will  also  allow  for  a  similar  test  between 
high  and  low  levels  of  both  inspector  and 
position.  Many  commercially  available 
software  packages  can  perform  these 
calculations.  The  analysis  of  NDE  experiments 
is  discussed  in  detail  in  Appendix  D. 

If  the  resulting  difference  in  the  response  is 
significantly  different  from  zero,  then  a 
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change  from  one  probe  to  another  will  have  an 
influence  on  the  NDE  response.  This  would 
indicate  that  reducing  the  amount  of  variation 
in  the  POD(a)  curve  would  require  more 
consistent  probes. 
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Fig.  A-4  A  cube  representing  a  fractional  factorial 
experiment 


Some  fractions  of  the  full  factorial  experiment 
are  better  than  others.  A  poorly  designed 
fractional  factorial  experiment  is  illustrated  in 
Table  A-3  which  shows  a  subset  of  the  full 
factorial  design  shown  in  Table  A-1.  Since 
the  (+)  and  (-)  signs  are  the  same  in  the  PR 
and  IN  columns,  this  test  confuses  the  PR 
and  IN  variables  with  each  other.  Conclusions 
about  PR  would  be  the  same  as  conclusions 
about  IN  since  all  levels  are  the  same  for 
each  test  condition.  Due  to  the  confused  main 
effects  of  PR  and  IN  ,  it  is  inconceivable  that 
this  test  program  would  ever  be  run.  To  avoid 
this  problem  with  confused  variables,  an 
experimenter  must  know  before  the  test  is 
conducted  which  variables  and  interactions 
are  important  or  significant  and  design  the  test 
taking  this  into  consideration. 

It  may  be  necessary  to  extend  the  testing  to 
more  than  three  variables  or  more  than  three 
levels  of  the  variables.  A  factorial  or  fractional 
factorial  design,  or  one  of  several  other 
classes  of  designs,  can  be  created  to  test 
these  situations.  It  is  recommended  that 
someone  knowledgeable  in  statistical 
experimentation,  most  likely  a  professional 
statistician,  assist  in  the  NDE  demonstration. 


A.3.4  Experimentation  by  Sampling 

An  alternative  NDE  evaluation  design  may 
be  purposely  to  confuse  all  effects  of  all 
variables  with  each  other  and  with 
experimental  error.  That  is,  the  output 
response  can  be  expressed  as: 

Y  =  f  (x-j, ...  Xp,  Xp+i, ...  Xp+r,  Xp+r+-|,  ...) 

where 

xj , ....  Xp  are  controlled  in  the  test 

Xp+1 , ...  are  uncontrolled  noise 

Xp+i . Xp+r  can  be  tested  but  are  not 

Xp+r+i  ...  cannot  be  identified  or  tested 

To  estimate  the  POD(a)  relationship  and 
the  corresponding  lower  bound  in  a  situation 
when  the  system  has  been  demonstrated  to 
be  in  statistical  control,  or  for  periodic 
reevaluation  of  NDE  capability,  a  sampling 
approach  may  be  appropriate.  Here  the 
overall  system  performance  is  to  be 
quantified,  as  well  as  some  measure  of  the 
variability  which  can  be  expected. 

For  example,  consider  a  PT  process  with 
20  inspectors,  and  a  specified  range  of 
acceptable  values  for  penetrant  dwell  time, 
emulsifier  concentration,  and  emulsifier 
dwell  time.  Suppose  also  that  the  range  for 
emulsifier  concentration  can  be  reasonably 
represented  by  its  two  end-points,  but  the 
ranges  of  dwell  times  are  large  enough  to 
require  a  mid-point  representation  to 
augment  the  end-point  values.  A  full 
factorial  evaluation  would  require  360 
observations: 

20  inspectors  x  3  penetrant  dwell  times  x 
2  emulsifier  concentrations  x  3  emulsifier 
dwell  times 

To  proceed  with  the  sampling  approach,  a 
full  factorial  of  these  360  observations 
would  be  tabulated.  Next,  a  sample  size, 
say  1 5  test  runs,  would  be  determined  and 
a  representative  random  sample  of  that  size 
tested  from  the  360  possible  observations. 
In  this  instance,  randomly  select  15  tests 
from  the  360  possible.  These  tests  would 
be  performed  in  this  randomly  selected 
order.  The  resulting  POD(a)  would  reflect 
error  from  all  the  combined  influences.  If  a 
large  variation  were  to  be  observed,  as 
indicated  by  the  POD(a)  confidence  limit, 
the  source(s)  would  be  indistinguishable 
from  the  noise.  That  is,  there  would  be  no 
way  to  associate  a  deviation  with  its  cause. 
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Fuii  Factorial  Test  Conditions  for  Figure  A-3 


Test 

X 

Y 

Z 

Condition 

PR 

POS 

IN 

PR'POS 

PR'IN 

POS'IN 

1 

+ 

+ 

+ 

+ 

+ 

+ 

2 

+ 

- 

+ 

- 

+ 

- 

3 

+ 

+ 

- 

+ 

- 

- 

4 

+ 

- 

- 

- 

- 

+ 

5 

- 

+ 

+ 

- 

- 

+ 

6 

- 

- 

+ 

+ 

- 

- 

7 

- 

+ 

- 

- 

+ 

- 

8 

— 

- 

- 

+ 

+ 

+ 

Table  A-2 


Fractional  Factorial  Test  Conditions  for  Figure  A-4 
( Columns  With  All  Signs  The  Same  Are  Confounded ) 


Test 

X 

Y 

z 

Condition 

PR 

POS 

IN 

PR*POS 

PR*IN 

POS'IN 

1 

+ 

+ 

+ 

+ 

+ 

+ 

4 

+ 

- 

- 

- 

- 

+ 

6 

- 

- 

-f 

+ 

- 

- 

7 

- 

+ 

- 

- 

+ 

- 

Table  A-3 

An  Improper  Fractional  Factorial  Experiment  Confuses  the  Main  Effects 
( Columns  With  All  Signs  The  Same  Are  Confounded  ) 

Test  X  Y  Z 


Condition 

PR 

POS 

IN 

PR*POS 

PR*IN 

POS'IN 

1 

+ 

+ 

+ 

+ 

+ 

+ 

2 

+ 

- 

+ 

- 

+ 

- 

7 

- 

+ 

- 

- 

+ 

- 

8 

- 

- 

- 

+ 

+ 

+ 

Box,  Hunter,  and  Hunter,  Statistics  for  excellent  discussion  of  the  design  and 

Experimenters.  Wiley,  1978,  provides  an  analysis  of  industrial  experiments. 
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APPENDIX  B 

FABRICATION,  DOCUMENTATION  &  MAINTENANCE 
OF  RELIABILITY  ASSESSMENT  SPECIMENS 


This  appendix  presents  general  guidance  for 
manufacturing  NDE  reliability  specimens  for 
use  when  no  existing  specimen  sets  can 
provide  an  adequate  evaluation  of  the  NDE 
process  under  evaluation.  Also  included  are 
general  guidelines  for  maintaining  the 
specimens  between  inspections. 

B.1  DESIGN 

Specimen  geometry  should  be  similar  to  that 
of  the  parts  being  inspected.  Holes  should  be 
typical  of  the  sizes  in  typical  engines. 
Specimens  representative  of  particular  part 
geometries  should  be  used  when  that 
information  is  known,  and  when  there  is 
reason  to  expect  that  the  inspection  will  be 
geometry  dependent.  Specimen  size  should 
be  such  that  inspection  of  the  specimens  is 
reasonably  similar  to  the  inspection  of  actual 
parts  Small  specimens  may  require 
scanning  motions  completely  divorced  from 
those  used  in  production.  This  should  be 
avoided  to  the  extent  practical.  Some  system 
evaluation  data  may  need  to  come  from 
inspection  of  actual  engine  hardware.  This  is 
particularly  true  of  systems  dependent  on 
line-of-sight  inspection,  such  as  for  PT.  The 
USAF  defines  a  selection  of  preferably  field 
cracked  engine  hardware  for  this  system 
evaluation. 

Machining  tolerances  for  the  specimens 
should  be  similar  to  those  for  the  engine 
hardware  to  be  inspected.  Specimens  should 
be  manufactured  to  cover  the  range  of  sizes 
allowed,  eg:  if  a  typical  hole  has  an  allowable 
diameter  range  of  0.015“  (including  MRB 
and  potential  rework),  the  specimens  used  for 
inspection  system  evaluation  should  span  at 
least  that  range.  This  may  not  be  a  significant 
concern  tor  some  features  for  particular 
inspection  methods,  for  example,  hole  size 
tolerances  may  not  be  an  issue  for  PT 
inspections. 

Environmental  conditioning,  to  represent  such 
conditions  as  in-service  oxidation,  should  be 
included  in  the  specimen  fabrication  if  they 
can  be  realistically  simulated.  This  simulation 
should  be  demonstrated  first  on  a  small 
sample  of  specimens  to  verify  its  validity. 


B.2  FABRICATION 

B.2.1  Processing  of  Raw  Material 

To  the  extent  that  the  specific  applications  of 
the  NDE  system  are  known,  it  may  be 
possible  to  specify  the  raw  material  processing 
of  the  test  specimens.  Issues  to  be  considered 
should  include  processing  techniques  (  ^g: 
forging  (isothermal,  upset,  flow  patterns,... 
powder  metal  (mesh  size,  HIP)),  casting, 
extruding,...).  Heat  treatment  of  the  specimens 
should  reflect  that  seen  by  the  parts,  as 
should  the  machining  processes  (turning, 
grinding,  broach,  EDM,  etc.).  If  the 
applications  are  not  known  precisely, 
specimens  representative  of  production  parts 
currently  receiving  similar  inspections  should 
be  selected. 

B.2.2  Establish  Machining  Parameters 

Machining  parameters  have  to  be  established 
for  each  desired  specimen  geometry  to 
simulate  the  component  fabrication  conditions. 
As  an  example,  for  a  specimen  with  a  crack 
located  at  the  intersection  of  a  cooling  hole 
with  a  countersink  as  might  be  present  in  a 
turbine  disk,  the  following  details  are 
presented.  Figure  B-1  illustrates  the 
component  geometry.  Figures  B-2  and  B-3 
give  the  crack  geometry  relationship  obtained 
from  the  destructive  evaluation.  Figure  B-4 
shows  how  a  given  final  crack  can  be  plotted 
graphically  for  a  given  initial  crack  that  has  an 
0.280  inch  diameter  hole  drilled  at  a  25  c  angle 
to  the  surface  with  a  38°  countersink.  The 
machining  of  this  specimen  was  accomplished 
on  a  Knight  vertical  milling  machine.  The 
specimen  was  held  on  an  angled  fixture  which 
established  the  hole  center  line  angle  (25  ° ) 
and  center  line  position  (0.096  inches  from 
the  crack  center).  A  drill  guide  was  placed  on 
top  of  the  specimen  and  cobalt  dr'»s  and 
reamers  were  used  to  generate  the  hole. 
Generation  of  the  countersink  machining 
parameters  were  done  by  trial  and  error  with 
dummy  holes  until  the  proper  depth  and 
location  was  established,  and  then  the 
countersink  was  machined  in  the  specimen 
with  the  specimen  held  horizontal  in  the  milling 
machine. 
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1SI  TURBINE  DISK 


Fig.  B-2 
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Fig.  B-4 


Because  the  final  machining  of  the  specimens 
has  a  direct  effect  on  surface  crack  size, 
shape,  and  aspect  ratio,  and  on  internal 
defect  location,  it  is  important  that  the 
specimen  blank  be  machined  to  the  same  tight 
tolerances  as  the  final  specimen  will  be. 

Since  several  thousandths  (0.001 ")  of  an  inch 
of  material  will  be  subsequently  machined  off, 
the  processing  of  the  blank  is  critical  only  to 
the  degree  that  the  machining  will  produce 
cold-working  or  some  heat  treatment  to  the 
depth  of  the  finished  specimen  surface.  For 
this  reason,  the  machining  parameters  should 
specify  such  things  as  depth  of  cut,  and  these 
parameters  should  be  held  constant  over  the 
population  of  the  specimens,  and 
documented  for  future  reference. 

B.2.3  Defect  Insertion 

Simulated  machining  defects  are  inserted  into 
the  finish  machined  specimen.  Surface 
cracks  shall  be  grown  from  EDM  notches  or 
tack  welds.  If  the  relation  of  specimen 
scanning  and  crack  orientation  is  known,  this 
should  be  accounted  for  in  the  crack 
generation.  If  this  relation  is  not  known,  the 
crack  orientation  should  be  random,  relative 
to  the  edges  of  the  specimen.  The  machining 
of  the  EDM  notch  shall  be  closely  defined  and 
documented  to  assure  repeatable  notches,  in 
terms  of  the  notch  dimensions  and  also  in  the 
amount  of  recast  layer  and  heat-affected 
zone.  Cracks  shall  be  grown  from  these  EDM 
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notches  by  stress  cycling  the  specimen  at  a 
stress  sufficiently  high  to  grow  with  no 
measurable  plastic  deformation.  Cyclic  lives 
(to  the  desired  crack  lengths)  should  be 
between  approximately  10,000  and  50,000 
cycles.  Cyclic  loads  or  strains  should  be  well 
documented  to  assure  consistent  application 
over  the  specimen  population.  Depending 
upon  specimen  geometry,  the  cracks  can  be 
induced  by  a  tensile  load  (applied  uniformly 
over  the  cross-section  of  the  specimen)  or 
three-point  or  four-point  bending. 
Environmental  conditions  under  which 
service-induced  cracking  would  be  introduced 
will  be  simulated  to  the  extent  reasonable. 
This  simulation  should  be  tried  first  on  a  small 
sample  of  specimens  to  establish  its  realism. 

Internal  defects  can  be  generated  by  milling 


shallow  (<  0.003“  deep)  holes  into  the  face 
of  a  block  to  be  diffusion  bonded  to  a  mating 
block.  Because  of  the  requirements  of  the 
diffusion  bonding  process,  the  mating  surfaces 
must  be  very  carefully  machined.  This  will 
also  facilitate  the  necessary  flaw  location  and 
machining  parameter  documentation. 

Flaw  documentation  must  include  critical 
parameters,  such  as  flaw  depth,  length, 
width,  and  bottom  radius.  For  examples, 
see  Figures  B-5  thru  B-8.  All  of  the  defects 
should  be  documented,  including  the  position 
and  orientation.  For  internal  defects,  size 
and  shape  of  the  defect  should  be  recorded. 
For  surface  cracks,  the  size  and  shape  of  the 
starter  notches  should  be  kept,  and  also  the 
stress  cycling  imposed  to  generate  the  cracks, 
including  the  loads  and  number  of  cycles. 
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B.2.4  Final  Machining 

Specimens  will  require  final  machining  to 
remove  misalignment  of  bonded  surfaces, 
provide  finished  contour,  and  remove  starter 
notches.  Especially  for  the  last  function,  it  is 
critical  that  tight  dimensional  tolerances  be 
maintained.  The  amount  of  material  removed 
can  have  a  significant  effect  on  the  final  shape 
and  size  of  the  defect.  A  magnified  visual 
inspection  must  be  conducted  to  verify 
complete  removal  of  the  starter  notch.  Some 
of  each  population  will  need  to  be  fractured  for 
the  specimen  verification  described  in  Section 
B.2.5. 

Final  machining  procedures  for  the  specimens 
must  be  carefully  followed  and  documented. 
The  specimens  used  for  system  evaluation 
should  be  machined  to  the  same  parameters 
as  the  parts  to  be  inspected.  Where  specific 
applications  are  not  known,  or  where  the 
specimens  cannot  be  machined  in  this 
manner,  specimens  with  surface  conditions 
typical  of  the  types  of  parts  to  be  inspected 
should  be  used.  Surface  condition  refers  to 
such  factors  as  finish  and  texture  and  to  the 
presence  or  absence  of  machining  or  handling 
marks  or  damage. 

B.2.5  Defect  Verification 

Both  the  aspect  ratio  and  length  of  the  fatigue 
cracks  shall  be  verified.  Specimen 
dimensional  information  should  be  recorded. 
This  data  must  concentrate  on  the 
characterization  of  the  flaws  as  regards  the 


position,  orientation  and  size.  For  surface 
connected  cracks,  measured  lengths  (and 
depths  for  hole  specimens)  should  be 
recorded  for  all  cracks.  This  measurement  is 
best  accomplished  by  magnified  (~  40  x ) 
optical  measurement  with  the  specimen  under 
-  60  %  of  the  load  used  during  the  crack 
growth  cycling.  The  aspect  ratio  shall  be 
verified  by  breaking  open  a  sufficient  number 
of  specimens  as  defined  in  the  CDRL  prior  to 
final  machining.  To  break  open  a  crack,  cut 
to  within  0.050  inches  of  each  end  of  the 
crack  with  a  saw  or  cut  off  wheel,  then  fracture 
the  specimen  with  a  single  load  application. 
Establish  the  crack  contour  to  surface  length 
relationship.  Failure  to  meet  the  estimated 
aspect  ratio  within  the  limits  specified  by  the 
Statement  of  Work  (SOW)  or  failure  to 
repeatedly  reproduce  an  aspect  ratio  within 
the  specified  limits  will  require  modification  of 
the  crack  generation  procedure  until  this 
requirement  is  met.  Once  the  desired  aspect 
ratio  can  be  demonstrated,  all  fatigue  crack 
lengths  shall  be  measured  to  within  0.002 
inches  in  the  final  machined  configuration. 

Specimen  flaw  response  should  be 
documented  for  all  specimens  using  a 
standard  test  technique  that  is  available  to  an 
independent  agency  or  the  contracting  agency 
who  wilt  be  the  specimen  custodians.  For 
systems  for  which  the  magnitude  of  signal 
response,  & ,  will  be  used  in  determining  the 
POD(a)  relationship,  the  flaw  response 
should  be  recorded  at  least  six  times  to 
provide  an  estimate  of  test-to-test  scatter. 
Specimen  re-verification  will  involve 
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comparison  of  the  results  of  periodic  repetition 
of  this  test  with  these  original  results. 

The  size  and  shape  of  the  imbedded  defects 
produced  by  diffusion  bonding  shall  be  verified 
by  sectioning,  as  required  by  the  contract.  The 
size  and  shape  of  other  types  of  imbedded 
defects  shall  be  verified  as  specified  by  the 
contracting  agency. 


B.3  MAINTENANCE 

Specimens  are  to  be  maintained  as  described 
in  Section  4.3.2.  The  goal  of  this  requirement 
is  to  preserve  specimen  integrity  for  the 
purpose  of  inspection  system  evaluation. 

B.3.1  Handling 

Specimens  should  be  stored  in  carrying  cases 
where  they  will  not  be  subject  to  metal-to- 
metal  contact.  This  is  to  prevent  scratching 
the  specimens  or  damaging  the  cracks  in 
them  accidentally.  To  assure  truly  back-to- 
back  system  evaluations,  it  is  imperative  that 
the  specimens  be  the  same  from  one  test  to 
the  next. 

B.3.2  Cleaning 

Because  the  inspection  process  may  leave 
residual  material  in  surface  connected  defects 
(eg:  penetrant  from  PT  inspections)  and  that 
this  material  may  affect  later  test  results,  it  is 
imperative  that  each  specimen  be  thoroughly 
cleaned  after  each  use.  When  the  inspection 
does  not  use  a  contaminating  fluid  (such  as 
ET  or  UT)  wiping  the  specimen  with  a  soft, 
lint-free  cloth  may  be  sufficient.  Use  of 
acetone  on  the  cloth  may  be  useful.  Where  a 
penetrant  is  used,  ultrasonic  cleaning  is 
necessary.  Vapor  degreasing  may  also  be 
appropriate.  All  chemicals  that  contact  the 
specimens  should  be  checked  to  assure  that 


they  are  not  damaging  to  the  specimen 
material. 

To  maintain  specimen  integrity,  the  specimens 
should  not  be  subject  to  any  metal-removing 
process  such  as  polishing,  sanding  or  etching. 

B.3.3  Shipping 

Because  the  same  specimens  may  be  needed 
for  several  system  demonstrations,  and  to 
lower  the  risk  of  damage  to  the  specimens  in 
transit,  the  cases  containing  the  specimens 
should  be  hand-carried  from  program  to 
program,  or  shipped  by  Next  Day  Air  Freight. 
Packaging  must  be  sufficient  to  allow  for  the 
rough  handling  that  can  be  expected. 

B.3.4  Storage 

USAF  Specimens  are  stored  in  an  office-type 
environment  at  Wright-Patterson  Air  Force 
Base.  The  materials  laboratory  is  the 
organization  responsible  for  maintaining  the 
inventory  of  the  specimens.  However,  the 
engineering  organization  is  the  point  of  contact 
for  requesting  use  of  the  specimens  for 
particular  testing  programs.  This  is  an 
example  of  what  could  be  done  with  other 
programs. 

B.3.5  Revalidation 

USAF  specimen  flaw  responses  will  be 
measured  at  least  annually  or  prior  to  use,  by 
the  materials  laboratory  using  the  same  test 
technique  and  procedure  used  in  the  original 
specimen  verification  ( Section  B.2.6  ).  The 
flaw  response  must  fall  within  the  range  of  the 
responses  measured  in  the  original  verification 
process.  If  it  does  not,  the  results  must  be 
examined  to  determine  if  the  specimen  has 
been  unacceptably  compromised  or  is 
salvageable  but  needs  to  be  re-characterized 
and  verified. 


APPENDIX  C 


MODELING  PROBABILITY  OF  DETECTION 


This  appendix  discusses  the  mathematical  and  statistical  procedures  which  have  been 
implemented  in  the  standard  POD(a)  software.  This  software  is  available  through  the  United 
States  Air  Force,  ASC/ENFSA,  Wrighi-Patterson  AFB,  Ohio,  USA,  45433. 

C.1  Background 

Early  attempts  to  quantify  probability  of  detection,  POD,  considered  the  number,  n,  of  cracks 
detected,  divided  by  the  total  number,  N,  of  cracks  inspected,  to  be  a  reasonable  assessment 
of  system  inspection  capability,  POD  =  n/N .  This  resulted  in  a  single  number  for  the  entire 
range  of  crack  sizes.  Since  larger  cracks  are  easier  to  find  than  smaller  ones,  cracks  were 
often  grouped  according  to  size,  and  n/N  calculated  for  each  size  range,  as  illustrated  in 
Figure  C-1 .  Grouping  specimens  this  way  improved  the  resolution  in  crack  size,  but  the 
resolution  in  POD  suffered  because  there  were  fewer  specimens  in  each  range.  Any  attempt  to 
improve  the  resolution  in  POD  by  having  more  specimens  in  a  given  group  would  necessarily 
decrease  the  resolution  in  crack  size.  Several  methods,  such  as  moving  averages  and  binomial 
distribution  methods,  were  proposed  to  circumvent  this  problem  but  they  required  very  large 
sample  sizes  and  suffered  from  other  analytical  difficulties. 

The  methods  in  this  document  are  based  on  a  POD(a)  model,  a  mathematical  description  of 
the  relationship  between  the  size  of  a  crack  or  defect,  a,  and  its  probability  of  detection,  POD  . 
The  parameters  of  the  model  are  estimated  by  choosing  values  which  are  most  likely  correct, 
given  the  results  of  the  inspection  being  modeled. 

C.2  Modeling  Probability  of  Detection,  a  vs.  a 

The  lognormal  formulation  of  the  POD(a)  model  is  a  natural  consequence  of  the  observed 
behavior  of  3  vs  a  data,  and  will  be  developed  here  in  that  context.  The  same  lognormal 
model  will  be  seen  to  apply  also  to  inspection  data  where  no  size  information  is  available.  The 
situation  for  pass  /  fail .  or  hit  /  miss  data  will  be  discussed  later. 

Some  NDE  procedures  provide  a  signal  response  that  is  correlated  with  crack  size,  if  the  crack 
is  detected.  The  data  presented  as  an  example  in  Table  C- 1  are  for  eddy  current  testing,  ET. 
The  magnitude  of  the  eddy  current  signal  is  quantitatively  correlated  with  crack  size. 

Fracture  mechanics  nomenclature  defines  crack  depth  as  a,  and  the  NDE  literature  refers  to 
crack  size  indication,  or  apparent  crack  size  as  3  ,  the  idea  being  that  £  is  correlated  with  a. 
Consider  the  30  specimens  given  in  Table  C-1,  where  every  fatigue  crack  of  size,  a  (measured 
in  inches),  has  an  associated  apparent  size,  3  (measured  in  scale  divisions).  The  units  of 
actual  crack  size  are  those  usually  associated  with  crack  depth  (eg:  mils,  inches,  mm,  microns) 
although  crack  length  or  crack  area  is  sometimes  used  as  the  correlative  parameter.  By  contrast, 
the  units  of  apparent  crack  size  can  be  nearly  anything,  eg:  millivolts,  number  of  contiguous 
illuminated  pixels,  total  signal  counts,  or  percent  of  some  maximum  scale  reading.  In  this 
discussion  these  units  are  major  scale  divisions  representing  signal  output  of  the  semi-automated 
system  on  which  the  measurements  were  made. 

In  any  real  inspection  some,  fatigue  cracks  may  be  too  small  to  be  detected  by  the  inspection 
apparatus.  The  system  output  signal,  3,  is  not  zero,  it  is  just  indiscernible  from  the  noise,  i.e.: 
less  than  3(h  ■  These  misses  have  no  associated  3  value  and  so  are  left-censored. 

Similarly,  cracks  which  are  sufficiently  large  can  overwhelm  the  system,  resulting  in  a  saturated 
signal.  Again,  the  apparent  size,  3,  is  unknown,  other  than  that  it  exceeds  some  saturation 


Table  C-1 


a  vs.  a  Data 


Bolthole  Specimens,  Semi-Automated  Inspection 


a 

3 

a 

3 

a 

3 

0.001 

* 

0.012 

2.2 

0.022 

7.7 

0.004 

* 

0.012 

3.4 

0.023 

11.6 

0.005 

1.5 

0.012 

2.4 

0.023 

8.0 

0.006 

e 

0.015 

3.0 

0.028 

** 

0.006 

1.2 

0.016 

7.3 

0.029 

** 

0.006 

2.6 

0.018 

7.3 

0.030 

13.2 

0.008 

1.2 

0.018 

4.0 

0.034 

19.6 

0.008 

2.8 

0.019 

5.0 

0.036 

16.2 

0.008 

1.6 

0.020 

7.3 

0.052 

19.2 

0.009 

2.7 

0.020 

11.6 

0.058 

19.6 

Nates. 

1 .  a  is  crack  size  in  inches 

2.  3  is  apparent  size  ( see  text ) 

3.  * .  **  censored  observations 

*  unknown,  below  3t^  -  10 
**  unknown,  above  3sat  =  20.0 


level,  isat.  These  saturated  observations  are  right-censored.  Given  the  a  vs  a  data,  it  is 

necessary  to  estimate  the  probability  of  detecting  a  crack  of  size  a,  POD(a).  The  POD(a) 
function  is  defined  as 


POD(a)  =  P(3>3dec) 


[C-1] 


where  3dec 's  a  predetermined  detection  threshold.  This  threshold  may  be  set  near  the  system 
noise  level  for  maximum  crack  detection  sensitivity,  or  set  somewhat  above  the  noise  level  to 
improve  the  system  discrimination. 

On  occasion  a  signal  will  exceed  S(h  when  there  is  no  actual  crack.  This  can  result  from  noise 
introduced  by  the  inspection  itself  (eg.  improper  scan  plan,  surface  irregularities,  or  probe 
lift-off)  or  from  some  real  but  innocuous  discontinuity  in  electrical  conductivity  or  magnetic 
permeability  within  the  material,  or  from  simply  setting  the  pass/fail  criterion  too  close  to  the 
material's  noise  threshold.  (The  difficulties  in  assessing  these  false  calls  are  noted  in 
Section  6.)  In  any  case,  a  part  found  to  have  a  questionable  indication  is  subjected  to  further 
scrutiny,  usually  cellulose  acetate  replication  and  subsequent  microscopic  examination. 

Signal  responses  which  are  either  obscured  by  noise,  or  too  large  to  be  measured,  are  called 
censored  observations.  Censored  observations  are  not  the  same  as  missing  observations;  the 
treatment  of  missing  data  is  discussed  in  Section  4.5. 

C.2.1  Developing  the  i  vs.  a  Model 

Referring  to  Figure  C-2,  it  is  seen  that  the  logarithms  of  3  and  a  can  be  linearly  related.  The 
linear  relationship  between  log  &  and  log  a ,  can  be  useful,  so  for  the  remainder  of  this 
discussion,  let: 
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x  =  log  a  and  y  =  log  i  . 

The  relationship  between  &  and  a  can  now  be  expressed  as: 

y  =  Po  +  Pi*  +  e 

and  in  Figure  C-2  the  residual,  e,  is  observed  to  be  approximately  normally  distributed  with 
zero  mean  and  variance  8^.  Several  dozen  collections  of  similar  data  have  been  studied  and 
the  linear  relationship  with  approximately  normal  residuals  occurs  quite  frequently  but  not  always. 
For  some  analyses,  it  has  been  necessary  to  restrict  the  range  of  crack  size  in  the  analyses  to 
ensure  these  properties.  The  residuals  of  the  ten  inspections  reported  here  are  presented 
collectively  in  Figure  C-3. 

The  POD(jr),  P(y>  yth  )•  IS  illustrated  as  the  shaded  region  under  the  normal  density  for  log 
crack  size,  x  in  Figure  C-2.  As  one  moves  along  the  x  axis,  the  location  (mean)  of  the  normal 
density  of  log  &  values  changes  ( y  =  Pq  +  p-jx)  and  thus  the  POD  also  changes. 

Now  under  the  above  assumptions,  2={y-(Po+Pi*)l/8  [C-2] 

has  a  standard  normal  distribution;  i.e., 

<| >(z)  =  e  ~  ( 2  2/  2),  the  standard  normal  pdf,  and 


oo 

Q(z)  =  JV^)  dq,  the  standard  normal  survivor  function 
z 


Then  POD(x)  =  P(  y  > 


yth)  =  Q 


yth  -  (Po  +  Pi*) 

5 


POD(z)  = 


1  - 


( yth  -  Po ) 1  Pl~ 

8/ Pi 


[C-3] 


Hence  the  POD  function  is  a  cumulative  normal  distribution  function  with  parameters 


yth  -  Po 
Pi 


and  o 


8/  Pi 


With  these  parameters, 

POD(a)  =  1  -  q[-1^-  ^]  [C-4] 

Notice  that  although  POD(a)  has  the  form  of  a  cumulative  distribution  function,  it  does  not 
represent  the  cumulative  probability  of  occurrence  of  a  crack  of  size,  a.  It  represents  the 
probability  of  detection  of  cracks  of  size,  a. 

C.2.2  Effects  of  Uncertainty  in  Crack  Aspect  Ratio 

Equation  C-4  expresses  the  probability  of  detection  in  terms  of  a  crack  size,  a.  In  some 
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experiments,  the  crack  size  in  the  test  specimens  might  be  known  exactly.  For  example,  in 
experiments  for  which  the  POD  would  be  calculated  in  terms  of  a  crack  length  measured  on  the 
surface  or  in  experiments  using  diffusion  bonded  specimens  with  exactly  defined  subsurface 
voids,  the  true  crack  size  would  be  known.  In  the  general  NDE  reliability  experiment,  the  crack 
size  must  be  inferred  from  an  assumed  or  observed  crack  aspect  ratio  based  either  on 
destructive  tests  of  a  few  specimens  or  on  experience  with  the  method  used  to  produce  the  test 
specimens.  In  this  general  case,  the  differences  between  the  true  and  inferred  crack  sizes  will 
have  an  effect  on  the  POD(a)  function.  Given  a  set  of  specimens  for  which  both  the  true  and 
estimated  sizes  are  known,  the  effect  of  using  the  estimated  crack  sizes  in  obtaining  the 
POD(a)  parameters  can  be  quantified.  The  following  presents  a  method  for  assessing  the 
magnitude  of  the  effect  of  using  an  estimate  of  the  crack  size  rather  than  the  true  (and  generally 
unknown)  value.  (Cochran,  1968) 

Define  aspect  ratio  as  c  =  crack  length/crack  depth.  Assume  the  relation  between  the 
measurement  of  crack  length,  am  ,  and  the  true  crack  depth,  at ,  is  given  by: 

log  a(  =  log  aOT  -  log  c  +  x\ 

where  q  is  normally  distributed  with  zero  mean  and  constant  standard  deviation  .  t| 

accounts  for  the  difference  between  the  calculated  crack  depth  assuming  a  constant  crack 
aspect  ratio  and  the  true  crack  depth.  In  the  initial  analyses  of  this  appendix,  the  random  error 
term,  q  ,  was  ignored,  i.e.,  it  was  assumed  that  the  aspect  ratio  exactly  correlated  crack  length 
and  depth. 

Assuming  that  q  has  zero  mean  implies  that  the  estimation  of  the  true  crack  size  is  unbiased. 
Assuming  that  q  has  constant  variance  implies  that  the  random  error  is  proportional  to  the  size 
of  the  crack.  These  assumptions  were  reasonable  for  the  specimens  that  were  destructively 
inspected  during  the  specimen  development  phase  of  the  RFC  Program. 

Interpreting  am  as  a  and  substituting  equation  C-4  for  log  &  into  equation  C-1  for  the 
calculation  of  POD(a)  gives 

POD(a{)  >  P[<§  >  <5dec]  =  P[  logd  >  log  idec] 

=  P  [  Po  +  Pi  lo9  +  e  >  tog  r5dec] 

=  P  ( Po  +  Pi  ( *°9  at  +  lo9  c  “  ^ )  +  e  >  lo9  aated 

=  P  [  e  -  Pi  n  >  log  idee  -  Po  -  Pi  ( log  at  +  log  c)  ] 


Let  4  =  e  -  Pi  q  and  assume  that  e  and  q  are  independent.  Then 


Thus  the  variability  observed  about  the  6  vs  at  relationship  is  inflated  by  an  amount  P^o^2. 
The  POD(a)  function  is  then,  after  simplification: 


log  &dec  ~  Po 

log  at  +  log  c  - 

Pi 

°$/Pl 


[C-5J 


POD(af)  =  1  -  <)> 
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Very  little  experience  has  been  acquired  in  the  analysis  of  the  relation  between  the  true  and 
measured  crack  sizes.  In  the  experiments  conducted  during  1 985  -  1 988  to  evaluate  the  RFC 
NDE  system,  the  value  of  ,  was  observed  to  be  significantly  smaller  than  8  and  the  effect 

of  scatter  about  the  crack  aspect  ratio  was  negligible,  and  so  equation  C-3  is  used. 

C.2.3  Maximum  Likelihood  Estimators 

The  estimates  of  the  POD(a)  parameters  discussed  in  this  document  are  maximum  likelihood 
estimates  MLEs,  which  have  several  desirable  statistical  properties.  Two  are  especially 
important. 

1.  MLEs  are  sufficient  statistics.  That  is,  for  a  given  underlying  statistical  model,  knowing 
the  MLE  is  just  as  good  as  knowing  the  actual  sample  data,  as  far  as  knowing  the  true 
values  of  the  model  parameters  is  concerned. 

2.  MLEs  themselves  have  known  statistical  properties.  For  large  samples  this 
distribution  is  very  nearly  normal,  and  centered  at  the  true  parameter  values. 

Because  this  normal  behavior  is  fundamental  to  much  of  the  analysis  of  NDE  data,  a  brief 
discussion  of  likelihood  is  in  order.  Likelihood  is  analogous  to  probability,  but  with  a  subtle  twist: 
A  probability  distribution  describes  the  behavior  of  the  data,  given  the  distribution's  parameters, 

§  .  By  comparison,  the  likelihood  describes  the  behavior  of  the  parameters  ,  given  the  data. 

The  data  are  considered  fixed,  since  they  have  already  been  observed;  it  is  the  model 
parameters,  then,  which  vary  according  to  the  given  statistical  model.  This  is  written  as 
L  ( §  ;  X)  where  the  undermark  indicates  a  matrix  of  values.  The  mathematical  formulation  of 
the  likelihood  and  its  corresponding  probability  density  are  identical;  they  differ  only  in  whether  it 
is  the  data  which  are  considered  fixed  (likelihood)  or  the  parameters  which  are  fixed  (probability). 

The  variance  -  covariance  matrix,  which  summarizes  the  behavior  of  the  maximum  likelihood 
estimators,  can  itself  be  estimated  from  the  sample  data.  Thus,  the  likelihood  function  provides 
not  only  the  model  parameters,  but  estimates  of  their  variability  as  well. 

The  asymptotically  normal  behavior  of  the  maximum  likelihood  parameter  estimators  is  exploited 
to  provide  confidence  bounds  for  POD(a)  curves  (section  C.3.2)  and  to  make  statistical 
comparisons  between  and  among  different  inspections  (Appendix  D). 

C.2.4  Parameter  Estimation,  £  vs.  a 

To  determine  the  relationship,  POD(a),  it  is  necessary  to  estimate  Pq,  p-j.and  8  of  equation 
C-2.  For  uncensored  data,  these  can  be  determined  using  the  familiar  least-squares  regression 
equations. 

When  some  observations  are  censored,  i.e.  £  value  exists,  the  regression  approach  becomes 
untenable.  That  is  because  the  true  location  of  the  observation  is  unknown  other  than  being  less 
than  the  noise  threshold  or  greater  than  the  system  signal  saturation  level.  Since  the  true 
location  is  unknown,  the  difference  between  the  observation  and  the  model  is  also  unknown. 

The  equations  based  on  minimizing  this  (squared)  deviation  are  therefore  unworkable. 

In  this  circumstance,  the  method  of  maximum  likelihood  can  be  used  to  obtain  parameter 
estimates  for  the  censored  data.  Lawless  (1982)  discusses  a  generalized  case  of  a  normal 
parametric  model  where  the  data  are  right-censored.  For  data  influenced  by  both  right-  and 
left-censoring,  order  the  data,  so  that  £ i  <  £2  <  ...  <  £n,  and  let  index: 

i  =  1 . m  represent  data  obscured  by  system  noise,  (  £  <  £(h) 

i  =  m  +  1 . m  +  r  represent  data  for  which  a  valid  signal  response  exists,  and 

i  =  m  +  r  +  1 . n  represent  saturated  signal  data,  (  £  >  a^) 
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The  likelihood  of  an  observation  at  2  is  ^  41(2)  and  the  likelihood  for  the  set  of  independent, 
uncensored,  observations,  is  then; 


m+r  ^ 

*-(p0.  p-j.  =  n  §  <t>(z) 

i=m+1 

Only  slight  modification  of  this  definition  is  required  to  address  censored  observations.  In  the 
case  of  right-censored  observations,  the  likelihood  is  simply  the  proportion  of  the  distribution, 
centered  at  y=  p0  +  Pi*,  which  lies  above  the  censoring  value,  ysa(.  Similarly,  for 
left-censored  data,  the  likelihood  is  the  proportion  of  the  distribution  below  y^,. 

The  complete  likelihood  for  all  three  situations  is  then 

m  m+r  1  n 

T(p0,  Pi.  s ;  j,  * )  -  n  (1  -  W2th))  n  5  <t>(2)  n  Q(zsat) 

1=1  i'=m+1  i=m+r+1 


The  likelihood  will  reach  a  maximum  when  its  first  derivatives  with  respect  to  the  model 
parameters  approach  zero.  Since  the  logarithm  is  a  monotonic  function,  the  maximum  of  log 
likelihood  will  coincide  with  that  of  the  likelihood  itself.  Taking  the  logarithm  of  L(Pq.  p1 , 5  ;  x,  ^ ) 

greatly  simplifies  the  subsequent  differentiations  by  reducing  the  series  of  products  to  one  of 
sums.  The  log  likelihood  is 


m  1  m+r  n 

log  L(P0.  Pi  -  8  :  *.  U )  =  X  ,o90  “Q (*tt»  '  Hog  5 -  £  (y-  (30+Pi  *))2  +  £  log  Q(zsat)  [C-6] 

i=1  i=m+ 1  i=m+r+1 


It  is  necessary  to  find  Pq,  p^ ,  8  such  that  the  first  partial  derivatives  of  the  log  likelihood  in 
equation  C-6  are  zero.  The  matrix  of  these  partial  derivatives  are  referred  to  as  the  score. 

C.2.5  Estimation  Algorithm  for  a  vs.  a  Data 

The  parameters  which  maximize  the  likelihood  equation,  C-6  ,  are  evaluated  iteratively  using  the 
following  equations. 

The  elements  of  the  score,  which  was  mentioned  in  the  preceding  section,  are: 


{  R  S  M 


-  gl  -  X*W(z) 

1  l  R  S  M 


M 


! 
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where: 

V(z)  =  <t>(z)/QW 


W(z)  =  <t>(z)/[1  -  Q(z)  ] 


The  matrix  of r  jative  second  partial  derivatives  of  the  likelihood  equation  with  respect  to  the 
model  parameters  is  called  the  Fisher  information  matrix.  The  information  matrix  is  used  by  the 
iteration  procedure  for  estimating  values  for  Pi  .  and  5  which  will  maximize  equation  C-6. 

Its  inverse  is  the  variance-covariance  matrix  of  the  model  parameters,  which  is  used  in  placing 
confidence  limits  on  the  POD(a)  relationship  (see  section  C.3.2). 

Elements  of  the  Fisher  information  matrix  are  estimated  by. 
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where, 


A(z)  =  MC*)  (  Hz)  -  z  ] 
y(z)  =  -  W(z)  [  W(z)  +  2  ] 


The  variance-covariance  matrix  of  the  log  &  vs  log  a  regression  parameter  estimates  is  related 
to  the  Fisher  information  by 
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Var  ( Po,  M> 


V0Q  ^01  v02]  r  Tl_  i 

^10^11^12  =  /[(Po.Pl.8)  J 
^20  V21  V22. 


The  elements  of  this  matrix  are  in  terms  of  the  log  £  vs  log  a  relationship.  It  is  necessary  to 
convert  this  matrix  to  the  corresponding  2X2  variance-covariance  matrix  of  the  POD(a)  model 
parameter  estimates. 
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Using  a  Taylor  series  expansion  about  the  true  values  of  p  and  o,  the  appropriate 

A  A 

vanance-covanance  matrix  of  \i  and  a  is  given  by: 

Var^.o)  =  T  Var  ( P0,  Pi ,  8 )  T 


and  the  transformation  matrix  T  is  defined  by: 


T  = 


1  p  0 

0  o  -1 


Performing  the  indicated  matrix  operations  provides  estimates  of  the  variances  and  covariances 

A  A 

of  p  and  o  as 


Var  ( P ) 


+  2  p  l/0i 


A  A 

Var  (  p  .  o ) 


v20  - 


p  Vf2 


+  po  i^i  ] 


Var{o)  =  TZ  [V22  ~  2oV?i  +  a2  ] 

Pi 


Inverting  this  2X2  variance-covariance  matrix  produces  the  2  X  2  Fis’  er  information  matrix 
used  to  place  lower  bounds  on  POD(a)  curves,  as  discussed  later  in  Append  C. 


C.2.6  Newton-Raphson  Iteration: 

The  Newton-Raphson  iteration  finds  a  zero  of  a  function  by  (grossly)  approximating  the  function 
with  a  tangent  plane  at  a  point,  and  solving  directly  for  the  zero  of  the  plane.  Then  the  function 
is  evaluated  at  this  zero  point.  If  the  function  itself  is  not  zero,  the  process  is  repeated  using 
this  new  point  as  the  reference.  The '  'notion  in  this  instance  is  the  score  vector,  the  derivatives 
of  the  likelihood  with  respect  to  the  r  odel  parameters.  When  Ihese  derivatives  are  zero,  the 

likelihood  will  be  at  its  maximum.  The  coordina.es  of  the  zero  point,  ( (j0.  Pi .  8 ) T,  are  therefore 
the  maximum  likelihood  estimates  for  the  model  parameters. 
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A  A  A  y 

Given  (  Pq,  Pi,  5 ) \  is  the  vector  of  parameter  estimates  after  fc  iterations, 

A  A  A  -r- 

Let  U  (  Pq.  Pi,  5 ) '  be  the  score  sector,  and 

A  A  A  *¥• 

Let  I  (  Pq,  Pi ,  8 ) '  be  the  Fisher  information  matrix,  as  described  above. 

The  Newton-Raphson  procedure  uses  uncensored  MLEs  as  initial  guesses,  and  solves 
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Or  until 

UfPo-P^S)  <  ^2 

where  tj-j  and  are  convergence  criteria. 

Examples 

Data  in  Table  C-1  and  similar  data  for  nine  other  inspections  were  analyzed  using  the  parameter 
estimation  procedure  described  here.  The  test  was  a  One-Factor-at-a-Time  design.  (Designs 
of  NDE  demonstration  and  evaluation  experiments  are  discussed  in  Appendix  D.) 

The  inspections  designated  A1,  B1,  B2,  B3  are  repeated  evaluations  of  the  (unchanged)  NDE 
system.  The  same  operator  performed  all  four  inspections  using  the  same  eddy  current  probe. 
Next,  the  inspection  probe,  and  therefore  system  calibration  parameters  were  changed,  and 
designated  as  inspection  C.  Inspections  G  and  H  changed  the  physical  orientation  of  the 
fatigue-cracked  specimens  being  inspected.  All  system  parameters  were  identical  to  inspection 
C.  Finally,  inspections  II,  12,  13  were  performed  by  a  new  operator.  Results  are 
summarized  in  Table  C-2.  A  representative  plot  of  the  POD  vs  a  relationship  (TestAl)  is 
provided  as  Figure  C-4. 


C.3  Hit/Miss  Analysis 

Fluorescent  penetrant  testing,  PT,  magnetic  particle  testing,  MT,  and  ultrasonic  testing,  UT, 
tend  to  be  characterized  by  their  binary  nature:  either  the  crack  is  detected  (Hit  or  1)  or  it  is 
not  (miss  or  0).  Unlike  eddy  current  inspection  data  for  which  some  crack  size  information  is 
available,  PT,  MT,  and  UT  data  are  usually  hit/miss  only.  This  presents  an  analysis  difficulty 
since  it  precludes  using  the  A  vs.  a  procedure  because  there  is  no  A.  The  A  vs.  a  analysis, 
discussed  in  detail  previously,  is  based  on  a  normal  distribution  of  apparent  size.  A,  for  a  crack 
of  actual  size  a  ,  the  model  parameters  being  estimated  by  maximizing  the  likelihood  of  the  test 
results  based  on  this  normal  distribution.  By  comparison,  PT,  MT,  and  UT  data  is  binomial  in 
nature  with  detection  probability  given  by  POD(a).  Maximum  likelihood  is  used  to  estimate  the 
parameters  of  the  model.  The  idea  in  both  cases  is  to  select  model  parameter  estimates  such 
that  the  likelihood  Is  maximized  based  on  the  model,  given  the  actual  data  observed. 
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Table  C-2  i 
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Model  Parameters  for  Semi-Automated  Inspections 


Test 

a50 

0 

Po 

Pi 

6 

ni 

A1 

0.00498 

0.2693 

7.5271 

1.4195 

0.3822 

30;  3;  2 

B1 

0.00526 

0.2343 

7.7306 

1 .4733 

0.3452 

30;  3;  2 

B2 

0.00489 

0.2642 

7.9070 

1 .4863 

0.3926 

30;  3;  2 

B3 

0.00473 

0.3070 

7.3941 

1.3812 

0.4240 

30;  3;  2 

C 

0.00474 

0.1968 

8.4873 

1.5859 

0.3120 

30;  3;  4 

G 

0.00484 

0.2549 

7.6671 

1 .4384 

0.3666 

30;  3;  3 

H 

0.00503 

0.3070 

7.7186 

1 .4585 

0.4477 

30;  4;  2 

11 

0.00557 

0.2379 

7.7638 

1.4956 

0.3558 

30;  4;  3 

12 

0.00520 

0.2012 

8.2517 

1.5691 

0.3157 

30;  3;  4 

13 

0.00596 

0.4662 

7.2437 

1.4142 

0.6594 

30;  6;  1 

Notes: 

1. 

a 50  =  e  F  ,  crack  size  at  50  % 

.  POD. 

2. 

Inspections  A1, 

B1,  B2,  B3, 

are  operator 

1 ,  repeat  tests.  Probe  and  syst 

calibration,  unchanged. 

3.  Inspection  C  changed  probe. 

4.  Inspection  G  and  H  changed  specimen  orientations. 


5.  Inspection  II,  12,  and  13  are  operator  2,  repeat  tests. 

6.  hi  =  total  observations,  n 2  =  data  in  noise,  H3  =  saturations. 


For  hit/miss  testing,  the  likelihood  of  P,  based  on  a  single  observation,  is: 

L  (  P '• ;  a,  ,  x , )  =  P.  x>  (  1  -  P. ) 1  "  x>  [C-7] 


where  P,  is  the  probability  of  detection  of  crack  size  a,  ,  and  x;  is  the  inspection  outcome,  0 
for  miss,  1  for  hit.  (Notice  that  when  the  exponent  of  P,  is  one,  that  of  (1  -  P,  )  is  zero,  and 

so  that  factor,  (1  -  P.®),  reduces  to  multiplication  by  one.  Similarly  with  P.x,  when  x  is 
zero.)  Pj  is  a  function  of  crack  size,  a, ,  and  the  log  normal  model  can  be  used  to  relate  P,-  = 
POD(a, )  with  crack  size. 

The  mode!  formulation  is 


P,  =  POD(a, )  =  1  -  Q(z, ) 


[C-8] 


where 


Q(Z, )  is  the  standard  normal  survivor  function, 


log  a,  -  p 


t 


,  is  the  standard  normal  variate, 


C-  1 


fi,  o  are  the  location  and  scale  parameters, 


a"d  5  -  LoJ  I 

j 

i 

The  log  odds  function,  which  is  an  approximation  to  the  log  normal,  is  often  suggested  in  similar  ■ 

situations  to  model  binary  data.  The  log  normal  model  is  used  here  to  be  consistent  with  the  j 

PO 0(a)  model  resulting  from  a  vs.  a  data. 

Recall  that  Pj  is  the  probability  of  detecting  crack  size  a,'  and  is  given  as  P;-  =  POD(a, )  in 

equation  C-8.  The  outcome  of  the  i  th  inspection,  x; ,  is  either  a  one  for  a  hit  or  a  zero  for  a 

*  » 

miss.  The  likelihood  of  two  independent  events  (inspections)  is  the  product  of  their  individual 

likelihoods. 

j 

The  overall  likelihood  of  having  observed  all  the  data  is,  then,  the  product  of  their  individual  ; 

likelihoods.  So  for  hit/miss  data  the  likelihood  is 


1 - 

-e; 

n-h 

L ( 0 ;  a,  x)  = 

rp< 

n<’  - 

.  /=  1  . 

,.1  J 

where  the  likelihood  of  the  ( h )  hits  is  the  first  term  of  equation  C-9,  and  the  second  term  is  the 
likelihood  of  the  (  n  -  h )  misses.  (  Note  that  P  (miss)  =  1  -  P  (hit) .) 

Now,  values  for  p  and  o  equation  C-8  can  be  selected  to  maximize  the  likelihood,  equation 
C-9.  Taking  the  natural  logarithm  of  Equation  C-9  changes  the  series  of  products  into  a  series 
of  sums.  The  log  likelihood  is  given  as  equation  C-10- 

h  n-h 

log  L  (  0  ;  a,  x)  =  £  log  Pj  +  £  log  (1  -  p; )  [C-10] 

»“1  /=1 

Because  the  logarithm  is  a  monotonic  function,  the  maximum  of  the  log  likelihood  will  coincide 
with  the  maximum  of  the  likelihood  itself.  Therefore  Equation  C-10  can  now  be  differentiated 
with  respect  to  p  and  o  ,  the  derivatives  set  equal  to  zero,  and  the  resulting  two  equations 
solved  simultaneously.  In  practice  it  is  convenient  to  perform  these  differentiations  numerically 
rather  than  algebraically,  as  was  done  in  the  case  of  3  vs.  a  .  As  with  the  &  vs.  a  analysis, 
the  negative  second  partial  derivatives  of  the  log  likelihood  provide  the  Fisher  information  matrix, 
used  to  place  confidence  bounds  on  the  POD(a)  relationship. 


C.4  POD  vs  a  Confidence  Bounds 

Confidence  bounds  can  be  placed  on  the  POD  vs.  a  relationship  by  taking  advantage  of  the 
asymptotically  normal  behavior  of  the  maximum  likelihood  estimators.  It  is  true  that  ML 

A 

estimators,  8 ,  have  an  asymptotically  multivariate  normal  distribution  with  mean  9  and 

variance-covariance  matrix  [/(©)]“ 1  (cf.  Kendall  and  Stuart,  1961  or  Cramer,  1946)  and 
consequentially  that 
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fl(0)  =  (e-0)7'/(0)(e-e) 


[C-11] 


is  asymptotically  a  chi-squared  variable  with  k  degrees  of  freedom  for  a  k-parameter  model.  The 
expected  Fisher  information  for  a  two  parameter  normal  model,  is  estimated  as  part  of  the  ML 
parameter  estimation  procedure, 

Since  the  POD  model  is  a  cdf,  1  -  Q(x;  0),  the  Cheng  and  lies  (1983,1988)  method  of 
placing  confidence  bounds  on  a  cdf,  can  be  applied  to  the  POD  equation. 

Plot  the  cdf  scale  and  location  parameters,  respectively,  and  define  C  to  be  their  confidence 

region.  From  equation  C-11  it  is  seen  that  as  [  p  ,  cr  vary  about  (p,  a within  C,  they 
describe  an  elliptical  boundary,  for  a  given  12 .  As  p  and  a  move  about  within  this  region,  the 
cdf  ( and  therefore  POD(a) )  changes. 

Now  consider  Xp  ,  the  p  th  quantile,  which  is  defined  by  P  [  x  <  ]  =  1  -  Q  ( Xp  ;  0)  =  p. 

For  a  fixed  p  ,  allow  0  to  vary  within  C  and  examine  the  behavior  of  Xp  . 

For  a  normal  cdf,  the  p  th  quantile  is  given  by. 

(xp-p)/c  =  Q-1(1-p)  =  r,  say, 
and  so 

Xp  =  p  +  to  [C-12] 

All  combinations,  9  ,  within  C  ,  can  be  obtained  from  equation  C-12  by  holding  p  constant. 

Now,  Xp  will  achieve  its  extreme  values  along  the  boundary  of  C  ,  as  given  by  equation  C-1 1 . 
The  largest  log  crack  size,  Xp  (max) ,  which  satisfies  both  equations  C-1 1  and  C-12  can  be 
calculated  using  the  method  of  Lagrangian  multipliers.  The  Lagrangian  is: 


X  (  xp  ,  n  ,  0  )  =  Xp  +  r\Q  ( p  ,  o  ) 


[C-1 3] 


where  12  ( p  ,  a )  is  given  by  equation  C-1 1  .  Xp  by  equation  C-12  ,  and  r)  is  the  Lagrangian 
multiplier.  Differentiating  equation  C-1 3  with  respect  to  0 -and  equating  these  to  zero,  then 
eliminating  r\ ,  provides  the  necessary  equations  for  determining  xp  (  max).  By  repeating  the 
evaluation  of  Xp  ( max)  for  all  p,  the  desired  confidence  band  on  POD(a)  can  be  constructed. 

The  95%  lower  confidence  bound  on  POD  illustrated  in  Figure  C-4  was  determined  in  this 
fashion  using  the  standard  software. 
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Residuals  of  10  Inspections  are 
Approximately  Normally  Distributed 
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APPENDIX  D 

ASSESSING  SYSTEM  CAPABILITY 


This  appendix  addresses  the  methods  for  assuring  that  the  estimated  POD(a)  curve  is  a  valid 
representation  of  NDE  system  capability.  It  includes  tests  of  model  and  data  compliance,  as 
well  as  statistical  methods  for  comparing  POD(a)  relationships  to  assure  that  they  can  be 
combined  to  represent  the  entire  NDE  system. 

The  POD(a)  characterization  of  capability  is  summarized  by  the  model  parameters,  p  and  o  , 
and  represented  by  the  resulting  POD(a)  curve.  The  lower  bound,  discussed  ;n  Appendix  C, 
reflects  the  statistical  uncertainty  of  the  estimate  of  POD(a)  function.  The  estimate  and  its  lower 
confidence  bound  are  compared  with  the  system  requirements  as  specified  by  the  CDRL.  In 
some  instances  these  requirements  wilt  not  have  been  met.  Ancillary  investigations  described 
here  may  be  required  to  isolate  the  cause(s)  of  inadequate  system  capability  so  that  remedial 
action  may  be  undertaken. 

D.1  Statistical  Tests  for  Model  Compliance 

Decisions  made  about  the  capability  of  the  system  to  meet  its  requirements  are  based  on  the 
POD  model.  Before  these  decisions  can  be  made,  the  “goodness*  of  the  POD  model  must  be 
assessed.  If  the  model  fails  these  tests,  then  the  decisions  made  regarding  the  system  through 
use  of  the  model  may  be  erroneous.  The  NDE  reliability  analyses  are  based  on  the  assumption 
that  the  relationship  between  crack  size  and  the  probability  of  detection  can  be  modeled  by  the 
cumulative  lognormal  distribution  function.  The  analysis  programs  will  usually  (but  not  always) 
produce  answers  even  if  this  assumption  is  not  reasonable.  Therefore,  consideration  must  be 
given  to  the  viability  of  the  model  in  each  new  application.  Different  approaches  to  validating  the 
model  are  required  for  the  a  vs.  a  data  and  hit  /  miss  data. 

D.1.1  8  vs.  a  Model  Compliance 

The  cumulative  lognormal  function  for  POD(a)  was  derived  by  assuming  that: 

a.  the  mean  of  log  &  is  a  linear  function  of  log  a; 

b.  the  regression  residuals  are  normally  distributed  with  zero  mean;  and, 

c.  the  standard  deviation  of  the  residuals  is  constant  for  all  values  of  a. 

As  a  minimum,  these  assumptions  must  be  subjectively  evaluated  by  a  visual  examination  of  a 
plot  of  log  &  vs.  log  a  for  each  data  set.  In  general,  regression  analysis  methods  are  robust 
with  respect  to  the  assumptions  of  normality  and  constant  standard  deviation  of  the  residuals. 
There  are  also  standard  statistical  tests  of  these  assumptions  which  can  be  used  to  remove 
subjectivity  from  the  validation  of  the  assumptions.  However,  it  should  be  noted  that  the  tests 
for  constant  variance  and  normality  of  the  residuals  are  relatively  insensitive  for  the 
recommended  minimum  number  of  cracks  in  NDE  reliability  experiments.  If  any  of  the  basic 
assumptions  are  not  valid,  the  discrepancies  must  be  noted  on  all  reported  parameter  values 
and  plots  derived  from  the  data  using  the  standard  analysis  method. 

When  the  log  response  signal  is  not  linear  with  log  crack  size,  it  is  likely  to  be  concave 
downward  at  the  larger  crack  sizes.  Ignoring  this  type  of  nonlinearity  results  in  values  of  a$Q 
that  are  too  small  and  values  of  o  that  are  too  large.  This  combination  of  wrong  parameter 
values  will  yield  overestimates  of  POD  at  small  crack  sizes  and  underestimates  of  POD  at  large 
crack  sizes.  Restricting  the  range  of  crack  sizes  in  the  analysis  may  correct  this  difficulty  when 
the  linear  range  extends  to  crack  sizes  which  produce  very  high  probability  of  detections. 
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For  the  POD(a)  model  to  be  sensible,  it  is  also  necessary  that  the  slope  of  the  log  a  vs.  log  a 
line  be  positive.  The  standard  computer  program  checks  for  a  positive  slope.  If  the  slope  of  the 
log  &  vs.  log  a  line  is  negative,  the  signal  response  is  not  an  appropriate  metric  for  making  a 
hit  /  miss  decision  in  the  NDE  system  as  the  POD(a)  function  decreases  with  crack  size.  ( If 
this  occurs,  the  NDE  system  should  not  have  reached  the  capability  evaluation  stage ).  If  the 
slope  is  positive  but  not  significantly  greater  than  zero,  the  lower  confidence  bound  on  the 
POD(a)  function  will  not  be  monotonic  and  will  eventually  curve  down.  In  this  case  the 
computer  program  will  not  produce  a  lower  bound  for  the  POD(a)  function  and  will  output  the 
message  'INADEQUATE  FIT  TO  THE  POD  MODEL'. 

It  should  be  noted  that  it  is  possible  to  develop  a  POD(a)  function  from  different  sets  of 
assumptions  regarding  the  £  vs.  a  relation.  However,  these  have  not  been  implemented. 

D.1.2  Hit  /  Miss  Model  Compliance 

Because  0  / 1  data  cannot  be  easily  plotted  as  decimal  fractions,  assessing  the  goodness-of-fit 
of  the  POD  model  is  less  straight  forward  than  with  £  vs.  a  data.  When  there  are  several 
inspections  of  the  same  crack,  a  plot  of  the  estimated  POD(a)  function  can  be  superimposed 
on  the  observed  detection  proportions  for  each  crack  in  the  experiment.  The  comparison  of 
model  to  data  will  be  based  on  a  subjective  comparison  of  the  fit.  If  only  one  inspection  has 
been  performed  on  each  crack,  the  observed  data  will  all  be  plotted  at  0  or  1  and  the 
comparison  of  model  to  data  is  difficult.  If  multiple  inspections  have  been  performed  on  each 
crack,  there  should  be  data  points  in  the  range  of  increase  of  the  POD(a)  function.  In  this  case 
the  subjective  evaluation  of  the  fit  is  easier. 

There  are  two  experimental  situations  in  the  hit  /  miss  analysis  which  permit  a  less  subjective 
evaluation  of  the  cumulative  lognormal  model.  If  each  crack  in  the  experiment  was  inspected  a 
large  number  of  times  or  if  a  very  large  number  of  different  cracks  were  used  in  the  NDE 
reliability  experiment,  then  the  applicability  of  the  model  can  be  checked  by  the  linearity  of  log  of 
the  odds  of  detection  versus  log  of  crack  size. 


log 


POD(a) 

1  +  POD(a) 


=  cq  +  c-\  log  a ,  where  Cq  and  c-j  are  the  intercept  and  slope,  respectively. 


The  cumulative  lognormal  distribution  function  is  approximated  by  the  log-odds  model. 

If  a  large  number  ( say  more  than  20  )  inspections  were  performed  on  each  crack,  reasonable 
detection  probabilities  would  be  available  for  the  cracks  in  the  range  of  increase  of  the  POD(a) 
function  ( assuming  such  crack  sizes  were  in  the  experiment ).  Similarly,  it  a  large  number  of 
different  cracks  ( say  more  than  200 )  were  used  in  the  experiment,  they  could  be  grouped  into 
independent  size  ranges  and  the  detection  probability  assigned  to  the  midpoint  of  each  range.  A 
plot  of  the  log  of  the  odds  versus  log  crack  size  would  provide  an  indication  of  the  linearity  of  the 
relation  (  either  subjectively  or  statistically  evaluated ). 

There  are  other  methods  for  evaluating  goodness-of-fit  for  dichotomous  data,  and  some 
statistical  data  analysis  software  packages,  such  as  SAS,  have  algorithms  for  assessing 
goodness-of-fit  for  binary  data. 

D.2  Drawing  Conclusions  from  the  Overall  POD(a) 

The  NDE  evaluation  experiment  has  been  designed  to  establish  the  capability  of  the  NDE 
system  in  terms  of  a  representative  POD(a)  curve  and  its  lower  95  percent  confidence  bound. 
The  capability  of  the  NDE  system  is  then  compared  to  the  requirements  as  specified  in  the 
CDRL.  If  the  system  fails  to  meet  the  requirements,  a  properly  designed  evaluation  experiment 
may  provide  the  information  required  to  identify  the  source  of  the  problem.  If  the  evaluation 
experiment  was  not  properly  designed,  it  may  be  necessary  to  conduct  additional  experiments  to 
isolate  the  cause(s)  of  the  non-compliance. 
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The  CDRL  capability  requirements  are  typically  expressed  in  terms  of  the  flaw  size  which 
corresponds  to  a  high  probability  of  detection.  The  requirement  may  be  stated  for  the  best 
estimate  of  the  capability  ( as  quantified  by  the  POD  function  )  or  for  a  conservative  capability 
evaluation  ( as  quantified  by  the  lower  95  percent  bound  on  the  POD  function ).  The  best 

A  A 

estimate  of  the  POD(a)  function  is  completely  determined  from  p  and  o  ,  the  estimates  of  the 

A  A 

parameters  p  and  <y  .  The  lower  95  percent  confidence  bound  depends  both  on  p  and  o 
and  on  the  variance-covariance  matrix  which  measures  the  statistical  ( sampling  )  variation  in 
the  estimates  of  p  and  a  .  The  larger  the  number  of  flaws  in  the  experiment,  the  closer  is  the 
confidence  bound  to  the  estimate. 

The  parameter  p  defines  the  crack  size  which  is  detected  50  percent  of  the  time, 
a$Q  =  exp(  p ).  This  crack  size  is  defined  as  the  median  detectable  crack  size  of  the  system. 
Under  the  lognormal  POD(a)  model  of  this  document,  the  crack  size  which  is  detected  p 
percent  of  the  time  is  given  by  a p  =  exp(  p )  exp(zp  o ) ,  where  Zp  is  the  pth  percentile  of 

the  standard  normal  distribution.  For  example,  ago  =  exp(  p )  exp(1.282  o).  If  POD(a)  is 
plotted  against  log  a  ,  increasing  p  with  o  fixed  shifts  the  function  to  the  right  without 
changing  its  shape.  Increasing  o  with  p  fixed,  holds  the  location  ( the  median  detectability ) 
but  flattens  the  curve  { larger  flaw  sizes  are  required  to  reach  a  fixed  POD  ). 

A  system  will  fail  to  meet  requirements  if  the  POD(a)  function  ( or  its  lower  confidence  bound  ) 
is  too  low  at  a  specified  crack  size.  To  improve  the  capability,  p  or  o  will  have  to  be  reduced. 

( The  confidence  bound  can  be  tightened  by  increasing  the  number  of  flaws  in  the  evaluation 
experiment.  Note,  however,  that  the  larger  the  value  of  o  ,  the  more  samples  are  required  to 
achieve  equivalent  widths  of  the  confidence  bounds ).  The  median  detectability,  exp  ( p ) , 
tends  to  be  determined  by  decision  thresholds  while  POD  flatness,  o ,  tends  to  be  determined 
by  variation  in  system  response  when  applied  to  flaws  of  the  same  size. 

Taking  measures  to  improve  the  system  capability  can  be  viewed  at  two  levels:  process 
optimization  and  process  variation  reduction.  To  provide  an  intuitive  distinction  between  process 
optimization  and  process  variation  reduction,  consider  that  any  inspection  process  can  be 
viewed  as  applying  a  stimulus  to  the  structure  and  interpreting  the  “magnitude"  of  the  response 
( in  whatever  form  it  may  take  \  Different  flaws  of  the  same  size  and  multiple  inspections  of  the 
same  flaw  when  inspected  under  absolutely  identical  conditions  will  produce  different  response 
magnitudes.  Reducing  the  scatter  in  these  response  magnitudes  is  process  optimization  and 
leads  to  a  smaller  o  in  the  POD(a)  function  for  that  set  of  test  conditions.  When  inspections  of 
the  same  flaw  are  made  for  different  inspection  conditions,  the  magnitude  of  the  inspection  result 
will  also  vary,  perhaps  significantly.  Since  the  different  inspection  conditions  are  all 
representative  of  the  application,  the  effect  of  this  variation  must  also  be  included  in  the 
capability  experiment  and  its  effect  also  shows  up  as  an  increase  in  o .  Reducing  the  scatter  in 
response  magnitudes  that  results  from  different  test  conditions  is  process  variation  reduction. 

Inspection  process  optimization  should  have  been  performed  prior  to  the  evaluation  experiment 
and,  in  fact,  could  have  been  accomplished  using  designed  experiments  as  discussed  herein. 
The  optimization  process  leads  to  the  definition  of  the  test  procedures  (  Subsection  4.3.3 )  and 
provides  the  basis  for  demonstrating  that  the  system  is  in  a  state  of  statistical  control 
{ Subsection  4.3.4 ). 

However,  process  optimization  cannot  be  based  on  fixing  all  factors  which  might  influence 
probability  of  detection.  Some  factors  will  inherently  change  during  the  application  of  the 
system.  For  example,  apparently  identical  probes  do  produce  different  responses  when  applied 
to  the  same  flaw  and  different  inspectors  do  have  different  levels  of  proficiency  at  applying  the 
inspection  stimuli  and  interpreting  the  response.  Probes  and  inspectors  have  their  own  POD(a) 
functions  for  the  system  and  the  scatter  of  these  functions  is  the  process  variation.  These  latter 
types  of  factors  should  have  been  accounted  for  in  the  design  of  the  evaluation  experiment.  If 
so,  their  effect  on  the  POO  function  can  be  determined  and,  if  significant,  can  indicate  a 
direction  for  improving  the  process. 
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D.3  Analysis  of  Data  from  One-Factor-At-A-Time  Experiments 

While  the  overall  goal  of  an  NDE  demonstration  is  to  describe  the  system  capability  with  a 
single  POD(a)  relationship,  it  is  often  necessary  to  compare  individual  POD(a)  curves.  The 
implicit  assumption  in  using  a  single  curve  to  represent  an  entire  NDE  system  is  that  the 
influences  of  system  parameters  such  as  inspector  or  probe  are  random  and  of  the  same  order 
as  system  “noise”  or  random  error.  By  comparing  POD(a)  curves,  the  hypothesis  that  the 
individual  curves  each  represent  the  same  NDE  system  capability  can  be  tested  statistically. 
Data  can  then  be  combined  to  produce  a  single  POD(a)  curve  which  represents  the  entire  NDE 
system. 

D.3.1  Comparing  Two  POD(a)  Curves 

One  of  the  useful  properties  of  maximum  likelihood  estimators  (cf.  Appendix  C.2.3 ),  such  as 
those  describing  the  POD(a)  relationship,  is  that  they  are  asymptotically  normally  distributed  as 
the  sample  size  increases.  These  normal  characteristics  can  be  used  to  compare  two  POD(a) 
curves. 


Let  X  •)  =  ( p  t  ,  o  -j  )  ^  and  X  2  =  ( p  2 .  o  2 )  T  be  the  estimated  inspection  behavior  for 
curves  1  and  2  respectively. 


If  M  1  and  M  2  are  the  true  mean  vectors,  then  the  expected  difference  between  X  1  and 

X  2  'S  M  1  -  M  2  ■  and  the  expected  value  of  the  variance-covariance  matrix  is  the  sum  of 
the  individual  covariances. 


Cov (  X  -j  )  +  Cov (  X  2)  =  £■)  +  L  2 


[D.1] 


By  the  central  limit  theorem 


(Xi  -  X2)  -  Np[(M1  -  M  2),  (Ii  -  ?2)1 


[D-2] 


where  Np  ,  indicates  a  p-variate  normal  population.  Since  there  are  2  parameters  in  the  POD 
model,  p  =  2  .  Under  the  null  hypothesis,  both  POD  curves  represent  the  same  (unknown) 
actual  capability,  ( p  ,  o  )T  =  M  . 

Thus,  M  ■)  =  M  2  *  M 

If  the  curves  are  similar,  the  statistical  distance  between  them  should  be  smell.  The  squared 
statistical  distance  from  {X  ^  -  X  2)  ^0  (M-\  -  M2)  =  0  is 

T2  =  ( X-!  -  X"2)t  [Ii  +  E21-1  ~  ?2)  P-3J 

which  is  analogous  to  the  square  of  the  t  statistic  in  univariate  analysis.  When  the  sample  size 
is  large,  T  2  has  an  approximate  chi-square  distribution  with  two  degrees  of  freedom,  x  2  2  • 
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Now,  I  1  is  the  inverse  of  the  variance-covariance  matrix  of  the  model  parameters  p  and  a  , 

and  is  called  the  Fisher  information  matrix.  Further,  the  observed  Fisher  information  is  the 
negative  of  the  matrix  of  second  partial  derivatives  of  the  log  likelihood  function  taken  with 
respect  to  the  model  parameters,  and  so  is  computed  as  part  of  the  maximum  likelihood 
parameter  estimation  procedure. 

To  evaluate  equation  D.3,  I  is  computed  for  each  curve  by  inverting  its  information  matrix.  The 

resulting  two  variance-covariance  matrices  are  added,  as  in  equation  D.1 ,  and  the  resulting 
matrix  is  inverted.  This  2x2  matrix  is  then  premultiplied  by  the  1x2  transpose  of  the  matrix 
of  differences  between  the  parameters  of  curve  1  and  curve  2,  and  postmultiplied  by  the  2x1 
matrix  of  differences.  The  result  of  equation  D.3  is  then  compared  with  the  appropriate  critical 

x  2  statistic,  x  2  ^  =  5.99  ,  for  a  95%  confidence  ellipse. 

If  T  2  >  x  2  2  the  null  hypothesis  is  not  supported  by  the  data,  and  curves  1  and  2  would  be 
considered  statistically  different. 

Example 


Table  D-la  provides  the  X  ^  ,  X2,  I  and  I2,  matrices  for  semi-automated  eddy  current 
inspections  A1  and  13  in  Table  C -2  to  illustrate  the  calculations  comparing  those  two 

inspections.  The  T  2  test  can  be  performed  by  any  hand-held  calculator  which  supports  matrix 
arithmetic;  no  special  software  is  required. 

T  2  for  inspection  13  ,  (second  operator,  third  inspection)  was  larger  than  the  critical  x  2  value 
of  5.99,  and  so  differed  significantly  from  test  A1 ,  the  first  inspection  performed.  All  10 
inspection  capabilities  are  plotted  in  Figure  D-1,  and  13  appears  unlike  the  others. 


Table  D-la 


Calculation  Comparing  Inspection  A1  with  13 
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D.3.2  Comparing  Many  POD(a)  Curves 

The  T  2  test  compares  one  POD(a)  relationship  with  another,  and  the  preceding  example 
compared  inspection  13  to  A1 .  The  selection  of  A1  as  the  standard  against  which  another 
inspection  was  compared  was  quite  arbitrary.  To  avoid  an  arbitrary  choice  of  a  standard 
inspection,  it  is  desirable  to  compare  all  POD(a)  curves  with  each  other  simultaneously.  Since 
there  are  two  model  parameters,  p  and  o  ,  the  comparison  must  consider  both  parameters, 
and  their  possible  interactive  behavior. 
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This  is  accomplished  by  again  exploiting  the  normal  behavior  of  the  model  parameters  and  using 
a  statistical  procedure  called  Multivariate  ANalysis  Of  VAriance,  MANOVA.  Although  a 
thorough  discussion  is  beyond  the  scope  of  this  document,  and  the  arithmetic  for  its 
implementation  is  messy,  the  underlying  idea  is  simple:  compare  the  variation  within  the 
POD(a)  relationships  with  the  variation  exhibited  between  inspections.  This  is  done  by  taking 
the  ratio  of  the  magnitude  of  the  within  variation  to  the  magnitude  of  the  overall  total  (within  plus 
between)  variation. 

The  determinant  of  the  variance-covariance  matrix  is  called  the  generalized  sample  variance, 
and  is  a  convenient  single  value  which  summarizes  the  magnitude  of  the  variation  in  I .  So  the 

magnitude  of  the  variability  within  inspections,  1  W I ,  is  the  determinant  of  the  sum  of  the 
covariance  matrices  of  the  model  parameters  times  the  sample  size  (the  number  of  specimens 
used  to  produce  the  individual  POD(a)  curves). 


I  W I  =  I  n  [  I  -)  +  I2  +  ■■•?£]!  [D.4] 

where  g  is  the  number  of  groups,  that  is,  the  number  of  POD(a)  curves  being  compared,  and 
n  is  the  number  of  specimens  being  inspected. 

The  multiplication  by  n  converts  I  from  a  matrix  of  mean  squares  and  cross-products  to  one  of 

summed  squares  and  cross-products,  SSC.  It  is  the  SSC  which  will  be  used  in  the  test 
statistic,  A*  ,  to  be  described  later. 

The  variability  between  inspections  is  estimated  from  the  model  parameters  themselves  as  the 
sum  of  squares  and  cross-products. 

2 

B  =  £<*,  ~T  )  <X,  ~  f  )T  [D.5] 

i=1 


where  g  is  the  number  of  groups  and  X  is  the  mean  of  the  X  vectors. 

The  magnitude  of  the  total  variability  is  the  determinant  or  the  sum  of  the  within  and  between 
matrices:  I  B  +  IV  I  . 

The  ratio  of  the  magnitude  of  within  variability  to  total  variability  is  called  Wilks's  Lambda  ,  A*. 


I  W  I 

A*  =  Tb  +  wl 
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This  test  statistic  is  related  to  F  for 


i  -  yJZ* 

L  (2  -  1)  J 

^  J 

p2(g 


a  two  parameter  model  by 

1),  2(N  -  g  - 1),  a 


[D.7] 


where  n  is  the  number  of  specimens,  g  is  the  number  of  groups,  and  N  =  tig  is  the  total 
number  of  specimen  inspections. 
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If  A*  is  too  small,  that  is,  if  the  total  variation  is  large  compared  with  individual  variation,  then 
the  between-inspections  variability  cannot  be  explained  by  chance  alone.  If  the  differences 
cannot  be  explained  by  happenstance,  the  curves  must  be  significantly  different. 

For  a  discussion  of  MANOVA  and  other  related  topics,  see  Johnson  and  Wichern,  Applied 
Multivariate  Statistical  Analysis  ,  2nded.,  1988,  Prentice  Hall. 


Example 

The  inspections  in  Table  C-2  were  compared  using  a  MANOVA,  which  showed  them  to  differ 
significantly.  Removing  inspection  13  and  performing  a  second  MANOVA  on  the  remaining 
nine  inspections  showed  no  difference  among  them.  Inspection  13  is  statistically  different  from 
the  others.  These  results  are  summarized  in  Table  D-2. 


Table  D-2a 

Mean  Vectors  and  Covariance  Matrices  for  Inspections  in  Table  C-2 
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D.4  Analysis  of  Data  from  Factorial  Experiments 


The  statistical  tests  discussed  in  the  previous  section  may  indicate  that  performance  of  a 
particular  inspection  differs  from  those  against  which  it  is  being  compared.  They  do  not, 
however,  provide  specific  information  as  to  the  cause  of  the  difference.  To  do  this,  the  overall 
observed  variance  must  be  partitioned  into  its  constitutive  components.  The  resulting  analysis 
will  then  permit  assignment  of  causes  for  differing  NDE  capabilities,  and  thus  allow  for  remedial 
action.  It  must  be  noted  that,  in  general,  the  components  of  variance  cannot  be  determined 
unless  the  experiment  was  planned  to  accomplish  this.  It  is  very  important,  therefore,  that 
proper  consideration  be  given  to  this  goal  before  any  experimentation  is  carried  out,  and  before 
any  data  are  collected.  See  Section  4.3,  Demonstration  Design,  and  Appendix  A,  Test 
Program  Guidelines. 


The  methods  discussed  previously  were  developed  to  compare  inspection  systems  using  data 
not  specifically  gathered  for  that  purpose.  A  designed  experiment  can  provide  more  engineering 
information  from  a  given  number  of  tests  than  is  available  from  the  one-factor-at-a-time  data 
presented  in  Table  D-2.  The  following  sections  describe  methods  which  can  be  used  with  data 
from  a  statistically  designed  experiment. 


D.4.1  Factorial  Experimental  Design 


In  any  NDE  demonstration  there  will  be  a  certain  amount  of  variation  from  inspection  to 
inspection.  With  the  proper  demonstration  design,  this  variation  can  be  partitioned  into 
components  of  variance,  each  component  being  assignable  to  a  specific  cause,  or  factor.  In 
some  instances,  interactions  among  the  factors  influencing  NDE  capability  can  also  be 

A  A 

identified.  Furthermore,  the  resulting  estimates  of  the  model  parameters,  p  and  a ,  will  be 
more  precise  because  they  are  based  on  the  average  behavior  of  several  inspections.  These 
types  of  demonstration  designs  are  called  Factorial  Designs,  because  they  can  identify  the 
factors  causing  (non-random)  variation. 


Example 


The  3  vs.  a  data  in  Table  D-3  were  part  of  a  demonstration  designed  to  assess  the  influence 
on  POD  of  different  operators,  different  probes,  and  different  positions  of  the  piece  being 
inspected  using  a  semi-automated  ET  system.  Data  in  Table  D-3  and  similar  data  for  eight 
other  inspections  were  analyzed  using  the  maximum  likelihood  parameter  estimation  procedure 
described  in  this  document. 


The  NDE  demonstration  was  a  factorial  test  to  evaluate  the  influence  on  POD(a)  of  three 
different  OPerators  (OP),  three  PRot?s  (PR),  and  two  Positions  (POS)  of  the  workpiece 
being  inspected.  Results  are  summarized  in  Table  D-4. 

D.4.2  Effect  of  NDE  Process  Parameters  on  p  and  a  Individually 


The  methods  presented  here  can  be  used  to  compare  POD(a)  relationships  which  result  from 
either  3  vs  a  data,  'h  hit  /  miss  data.  They  are  straightforward  applications  of  well  known 
statistical  procedures  and  can  be  performed  by  many  commercially  available  statistical  packages. 


Often  a  quick  comparison  of  the  individual  model  parameters,  considered  separately,  is 
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Table  D-3 


&  vs  a 

Data  for  Web/Bore  Surface  Flaws,  Semi-Automated  Inspection 


a 

a 

a 

a 

a 

a 

0.001 

(10) 

0.009 

1.60 

0.015 

10.10 

0.003 

(10) 

0.009 

4.40 

0.016 

11.00 

0.003 

(1.0) 

0.010 

5.10 

0.019 

15.00 

0.006 

3.800 

0.010 

6.60 

0.022 

22.00 

0.007 

3.000 

0.011 

6.00 

0.029 

29.00 

0.007 

2.900 

0.011 

8.40 

0.031 

38.00 

0.008 

3.900 

0.012 

5.80 

0.042 

31.00 

0.008 

3.600 

0.013 

57.40 

0.065 

49.00 

0.009 

2.200 

0.014 

2.20 

0.100 

80.30 

Notes: 

1 .  a  is  cracksize  in  inches 

2.  a  is  apparent  size  (  see  text ) 

3.  *,**  censored  observations  : 

*  unknown  ,  below  a ^  =  1 .0 
**  unknown  ,  above  isat  =  20.0 


Table  D-4 

Model  Parameters  for  Semi-Automated  Inspections 


OP 

PR 

POS 

a5Q 

a 

1 

1 

1 

0.00326130 

0.235297 

1 

2 

1 

0.00335512 

0.260288 

1 

3 

2 

0.00337838 

0.201442 

2 

1 

2 

0.00335999 

0.400897 

2 

2 

1 

0.00354285 

0.393517 

2 

3 

1 

0.00339956 

0.399634 

3 

1 

1 

0.00302999 

0.233559 

3 

2 

2 

0.00336885 

0.331408 

3 

3 

1 

0.00337758 

0.260116 

a 

P 

8 

”1 

”2 

"3 

8.0673 

1.4090 

0.33153 

25 

3 

0 

8.0807 

1.4184 

0.36918 

26 

3 

0 

8.2139 

1 .4435 

0.29078 

25 

3 

0 

7.9109 

1 .3889 

0.55680 

24 

4 

0 

8.1534 

1 .4449 

0.56860 

24 

4 

0 

8.0139 

1 .4099 

0.56343 

24 

4 

0 

7.9871 

1.3773 

0.65326 

25 

3 

0 

7.8785 

1.3839 

0.45862 

25 

3 

0 

8.1646 

1.4348 

0.34904 

25 

3 

0 

Notes : 


1. 

2. 


350  •  cracksize  at  50  %  POD 

n-f  =  total  observations  ,  u£  =  data  in  noise  ,  113  =  saturations 


D-ll 


informative.  An  ANalysis  Of  VAriance,  ANOVA.  is  performed  which  considers  only  one 
model  parameter  at  a  time. 


The  statistical  ANOVA  model  is  y  =  y  +  OP,-  +  PR,  +  POS^  +  e^,  where  y  is  the 

model  parameter  (either  p  or  0 )  beinc  evaluated,  and  y~~  is  average  parameter  response, 
and 

1  =  1.../,  the  number  of  operators 
j  =  1 .../,  the  number  of  probes 
k  =  1..1,  the  number  positions,  and 
tjjfr  is  the  random  error. 


The  experiment  has  been  designed  so  that  an  unambiguous  test  can  be  performed  to  determine 
if  a  difference  between  operators,  between  probes,  or  between  positions  is  statistically 

significant.  The  test  used  is  an  F  test.  The  statistic  has  the  form  F  =  s-|2/s22  where  s  -|2 
and  s  22  are  two  independent  mean  squares.  This  method  assumes  that  the  data  comes  from 
a  normal  distribution.  Since  p  and  a  are  MLE's,  this  is  a  reasonable  assumption.  This 
assumption  is  necessary  particularly  for  small  sample  sizes. 


The  F  statistic  is  used  to  test  hypothesis  of  the  form  Hq  :  o  i2  -  a  22.  That  is,  is  the 
variance  attributed  to  a  specific  cause  equal  to  the  variance  due  to  random  causes.  If  o  1 2  is 

greater  than  a  22  ,  then  the  variation  in  the  response  between  the  levels  of  a  factor  (eg: 
operator,  position,  or  probe)  is  greater  than  the  experimental  error.  The  ratio  of  estimates  of 
these  two  components,  F  ,  should  be  approximately  equal  to  one  if  the  hypothesis  is  true,  and 
greater  than  one  if  the  data  do  not  support  the  hypothesis. 


Table  D-5 

Analysis  of  Variance  Table 

Source 

df 

SS 

MS 

F 

OP 

1-1 

1 

K£<F i..  -F...)2 

i=1 

s12  =  SS0p/dfop 

S-i2/S2 

PR 

J-1 

J 

«£<y.j.  -  f...  )2 

i=i 

S22  =  SSpR/dfpr 

S22/S2 

POS 

K-1 

K 

k  -  f...  )2 

k=1 

S32  =  SSpos^fpos 

s32/s2 

error 

subtract 

subtract 

S2 

Total 

IJK-1 

IIKyijk  -  F  )2 

t 
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Example  j 

I 

1 


Using  the  data  in  Table  D-4  ,  the  ANOVA  for  |i  is  ( 

*i 

* 

i 


Table  D-6 

\ 

X 

f 

ANOVA  for  Model  Parameter  p) 

Source 

DF 

Type  III  SS 

F  Value 

prob  >  F  j 

OP 

2 

0.00000005 

2.36 

0.24253 

PR 

2 

0.00000007 

3.63 

0.15803 

POS 

1 

0.00000000 

0.35 

0.59767 

As  F  increases,  p  decreases.  The  larger  the  differences  between  levels  in  a  factor,  the  larger 
the  value  of  F  .  The  larger  the  F,  the  greater  the  incredibility  associated  with  Fi q  :  a  i2  =  a  22- 
A  measure  of  this  incredibility  is  the  probability  that  an  F  as  large  as  the  observed  F  could  have 
occurred  if  Hq  were  true.  This  probability  is  called  a  p-value  associated  with  the  observed  F  , 
in  practice,  p-valuesof  p  =  0.10  or  p=  0.05  are  considered  significant.  In  Table  D-6,  PRobe 
is  the  most  significant  variable  although  it  is  not  statistically  significant  at  the  usual  confidence 
levels  (10%,  5%,  or  1%). 

The  ANOVA  for  o  is: 


Here  the  p-value  for  operators  is  p  =  0.01 81 7  indicating  a  statistically  significant  difference  in 
the  levels  of  operator. 


0.4.3  Analysis  of  the  Means 


To  perform  the  ANOVA  ,  the  mean  was  calculated  for  each  level  of  each  variable.  Once  a 
significant  difference  has  been  detected  by  the  ANOVA  ,  the  average  values  for  each  level  of  a 
factor  (the  mean)  are  examined.  These  values  are  examined  to  determine  the  magnitude  of 
the  difference  between  them  and  to  determine  if  a  variable  which  is  statistically  significant  is 
practically  significant.  For  example,  it  may  be  that  a  difference  in  p  is  statistically  significant, 
but  upon  examining  the  average  values  it  is  found  that  the  largest  difference  between  the 
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averages  is  0.001 .  Although  this  difference  is  statistically  significant,  it  is  not  practical  to 
differentiate  to  the  0.001  level.  Also,  large  differences  which  are  not  statistically  significant 
should  be  investigated.  It  should  be  determined  if  the  lack  of  significance  is  due  to  having  not 
included  a  significant  variable  in  the  experiment  or  if  the  sample  size  for  the  experiment  was  not 
large  enough. 


Example 


Table  D-8  summarizes  the  analysis  of  means  for  the  example  used  throughout  section  D-4. 
Given  is  the  variable,  the  level  of  the  variable,  and  the  model  parameter  of  interest  (either  p  or 
o ).  Here,  a  statistically  significant  difference  (DIFF)  is  represented  for  a  group  by  a  different 
letter  of  the  alphabet. 


Table  D-8 
Analysis  of  Means 


OP 

P 

DIFF 

o 

DIFF 

1 

0.00333 

A 

0.23234 

B 

2 

0.00344 

A 

0.39802 

A 

3 

0.00326 

A 

0.27503 

B 

PR 

1 

0.00322 

A 

0.28992 

A 

2 

0.00342 

A 

0.32840 

A 

3 

0.00339 

A 

0.28706 

A 

POS 

1 

0.00332 

A 

0.29707 

A 

2 

0.00337 

A 

0.31125 

A 

The  means  indicate  that  there  is  only  one  significant  difference  :  that  due  to  OP  for  the 
parameter  o  .  Remember  that  this  test  is  done  at  an  a  =  0.05  level  of  significance.  It  may 
be  that  a  more,  or  less,  strict  level  is  required. 


D.4.4  Effect  of  NDE  Process  Parameters  on  p  and  o  Jointly 


Data  from  factorial  designs  can  be  analyzed  using  a  MANOVA  procedure  similar  to  the  one 
described  in  section  D.3.2  .  However,  there  is  a  fundamental  difference.  In  the 
one-factor-at-a-time  data  it  was  possible  only  to  conclude  that  all  ten  inspections  were  not  the 
same  ;  no  further  breakdown  as  to  the  influence  of  operator,  eddy  current  probe,  experimental 
set-up,  or  other  cause,  was  possfole.  With  factorial  design,  the  data  are  balanced  so  that  the 
influence  of  each  factor  can  be  identified  by  its  contribution  to  the  total  sum  of  squares  (a  sort  of 
statistical  distance  between  an  individual  observation  and  the  average  for  that  condition).  The 
MANOVA  procedure  is  available  in  many  commercially  available  statistical  analysis  software 
packages. 
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A  MANOVA  simultaneously  compares  the  variation  in  both  model  parameters,  p  and  a  , 
which  results  from  a  given  factor  (or  combination  of  factors)  with  the  random  variation  observed 
in  the  inspection  system.  This  random,  or  error,  component  of  variance  can  be  estimated  from 
the  variance-covariance  structure  of  the  data.  The  analysis  can  be  greatly  simplified,  however, 
by  using  instead  the  variation  attributed  to  the  highest  order  interaction.  For  example,  the 
interaction  among  operator,  probe,  and  position  of  the  workpiece.  It  is  unlikely  that  this 
interaction  would  be  as  influential  as  the  main  effects  (eg:  operator,  probe,  position,  by 
themselves)  or  as  the  second  order  interactions  (eg:  operator-probe,  operator-position, 
probe-position ).  Confounding  this  third  order  interaction  with  random  error  greatly  simplifies  the 
subsequent  MANOVA  because  the  individual  variance-covariance  matrices  would  not  have  to 
be  evaluated  as  part  of  the  analysis.  Even  with  a  packaged  program,  keying  in  many  large 
variance-covariance  matrices  is  tedious  work.  The  simplified  procedure  requires  only  the  model 
parameters  themselves,  and  that  they  have  resulted  from  a  factorial  NDE  demonstration  design. 


Example 


A  multivariate  analysis  of  variance  (MANOVA)  was  performed  on  the  data  resulting  from  the 
factorial  design  summarized  in  Table  D-4.  Wilks's  Lambda  was  computed  as  the  criterion,  and 
an  F  test  was  performed. 


Table  D-9 

MANOVA  for  Model  Parameters,  p  and  o 


Factor 

F 

P 

OP 

3.88 

0.10868 

PR 

1.40 

0.37674 

POS 

0.20 

0.83501 

Overall  operator  has  an  effect  on  the  POD  with  both  p  and  a  considered  simultaneously. 
Changing  p  moves  the  POD  curve  horizontally.  Changing  o  varies  the  shape  of  the  curve, 
but  not  is  central  location.  The  MANOVA  calculations  test  if  these  combined  effects  are 
significant  in  showing  a  difference  among  operators,  probes,  or  positions. 


D.4.5  Components  of  Variation 


The  components  of  variation  can  be  decomposed  into  variation  due  to  each  factor  (OP,  PR, 
POS,  error) .  Basically,  the  mean  square  for  each  factor  is  not  an  expression  of  variance  for 
that  factor  alone,  but  is  a  function  of  that  factor  and  possibly  other  factors. 


The  components  of  variation  in  p  and  o  for  each  factor  can  be  found  by  substituting  the 
estimate  of  error  V(error)  =  0.00326027  and  setting  each  equal  to  its  EMS  value.  Table 
D-10  illustrates  these  calculations  for  this  example. 


Sometimes  negative  components  of  variance  occur  due  to  rounding  or  general  lack  of 
significance  of  any  variable.  In  this  case  the  components  are  set  equal  to  zero. 
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Table  D-10 

MANOVA  for  Model  Parameters,  (i  and  o 

Source 

Type  III  Expected  Mean  Square 

OP 

V  (error)  +  3V  (OP) 

PR 

V  (error)  +  3V  (PR) 

POS 

V  (error)  +  4V  (POS) 

Composite  Plot  for  Semi-Automated  Inspections 
Showing  Inspection  13  To  Be  Different 


ACTUAL  DEFECT  SIZE  (DEPTH,  INCHES) 


Legend 

A! _ 

B1 _ 

B2 _ 

B3 _ 

C _ 

G _ 

H . 

U _ 

12 _ 

13 


DATE:  OT/Zi/19 
TIMI:  I2:M:31 


Fig.  D-l 
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APPENDIX  E 

EXAMPLE  DATA  REPORTS 


This  appendix  presents  sample  data  sheets 
for  reporting  test  matrices  and  the  results  of 
individual  inspections.  Examples  of  summary 
results  are  also  included  for  reference. 


E.1  TEST  MATRIX 


Figures  E-1  and  E-2  are  examples  of  two 
methods  for  summarizing  the  description  of  a 
capability  evaluation  test  matrix.  For  this 
example  it  was  assumed  that  the  assessment 
of  an  ET  system  was  to  include  the  effects  of 
two  operators,  two  probes,  and  two 
replications.  Figure  E-1  is  essentially  a  list 
of  the  combinations  of  the  levels  of  the  test 
matrix.  Figure  E-2  is  a  table  of  the  test  factor 
combinations  and  shows  the  levels  of  all  of  the 
factors  being  evaluated.  Although  Figure  E-2 
more  clearly  displays  the  experimental  design, 
this  format  becomes  unwieldy  if  the 
experiment  contains  more  than  four  factors  or 
more  than  three  levels  of  the  factors. 

E.2  INDIVIDUAL  TEST  RESULTS 

Figure  E-3  is  an  example  data  sheet  for  a 


permanent  record  of  the  individual  test  results 
of  an  NDE  evaluation.  The  results  from  each 
inspection  of  the  specimen  set  under  a  defined 
set  of  conditions  are  presented  in  the  column 
for  the  specific  test. 

E.3  ANALYSIS  RESULTS 

Figures  E-4  and  E-5  present  examples  of 
the  £  vs.  a  and  hit  /  miss  analyses, 
respectively.  In  both  of  the  examples,  the 
analysis  provided  complete  sets  of  parameter 
estimates. 

Examples  of  the  plots  required  in  the  results 
summary  are  presented  in  Figures  E-6 
through  E-9.  The  POD(a)  functions  with  95 
percent  confidence  limits  for  the  analyses  of 
Figures  E-4  and  E-5  are  presented  in 
Figures  E-6  and  E-7,  respectively.  These 
figures  illustrate  the  minimum  information  that 
must  be  included  on  all  plots  of  the  POD(a) 
function.  Figure  E-8  presents  the  log  £  vs. 
log  a  plot  for  the  analysis  of  Figures  E-4  and 
E-6.  The  POD(a)  function  and  the  observed 
detections  for  the  hit  /  miss  analysis  of 
Figures  E-5  and  E-7  are  presented  in 
Figure  E-9. 
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EXPERIMENTAL  DESIGN  DATA  SHEET 


DATE:  _  EXPERIMENT  ID  NUMBER: _ 

NDE  SYSTEM  :  _  _  SPECIMEN  SET:  _ _ 

ORGANIZATION:  _ _ 

OBJECTIVE:  To  evaluate  Station  1  of  the  RFC  system  for  two 

randomly  selected  operators.  probg_s_and  replications _ jp_fl 

complete  factorial. experiment 


Test 

Identification 

Operator 

Number 

Probe 

Number 

Replication 

Number 

111 

1 

1 

1 

112 

1 

1 

2 

121 

1 

2 

1 

122 

1 

2 

2 

211 

2 

1 

1 

212 

2 

1 

2 

221 

2 

2 

1 

222 

2 

2 

2 

Randomization:  The  eight  sets  of  inpsections  were  conducted  in 
a  random  order. 


Fig.  E- 1  Example  data  sheet  for  describing  the 
experimental  design  —  list  format 
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EXPERIMENTAL  DESIGN  DATA  SHEET 


DATE:  _ _  EXPERIMENT  ID  NUMBER:  _ 

NDE  SYSTEM  :  _  SPECIMEN  SET:  _ 

ORGANIZATION:  _ 

OBJECTIVE:  To  evaluate  Station  1  of  the  RFC  system  for  two 

randomly  selected  operators.  probes  and  replications 
in  a  complete  1  a_C-t.orlal  experiment 


Table  si  Test  .Identification  Numbers 
Operator  Operator 


1 


2 


Probe  1  -  Rep  1 

111 

211 

-  Rep  2 

112 

212 

Probe  2  -  Rep  1 

121 

221 

-  Rep  2 

122 

222 

Randomization:  The  eight  set*  of  inspection!  were  conducted  in 
a  random  order. 


Fig.  E-2  Example  data  sheet  for  describing  the 
experimental  design  —  table  format 
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TEST  RESULT  DATA  SHEET 

Page _ of 

DATE: _  EXPERIMENT  ID  NUMBER: _ 

NDE  SYSTEM:  _  SPECIMEN  SET:  _ 


AHAT  VS  A  POD  ANALYSIS 
VERSION  2.3b 


DATE:  30-JUL-90 


IDENTIFICATION:  FILE  - 

DATA  SET  - 

inspections 


RFC2WBIN.DAT 

WBIN100 

A  c  D 


REGRESSION  ANALYSIS 

MODEL:  LN(AHAT) -B0»B1*LN  (A) 


CRACK  SI2E  RANGE:  1.00  TO  100. 

NUMBER  OF  UNCENSOREO  CRACKS:  25 

RECORDING  THRESHOLD:  70.  NUMBER  OF  CRACKS  BELOW  THRESHOLD:  2 

SATURATION  LEVEL:  4095.  NUMBER  OF  CRACKS  AT  SATURATION:  1 


PARAMETER  ESTIMATES 
PARAMETER 


ESTIMATE  SE 


INTERCEPT  (BO)  - 
SLOPE (Bl)  - 
RESIDUAL  ERROR  - 


3.06  .300 

1-44  .116 

.417  .593E-01 


REPEATABILITY  ERROR:  .268 


POO  MOOEL  PARAMETER  ESTIMATES 
SIGMA:  .328 

INSPECTION 


THRESHOLD 

A50 

A90 

A90/95 

vu 

V12 

V  22 

70.0 

2.29 

3.48 

4.65 

.  2  12E-01 

-. 325E-02 

. 193E-02 

100. 

2.93 

4 .46 

5.79 

.  162E-01 

-.276E-02 

. 193E-02 

200. 

4.76 

7.22 

8.99 

.  888E-02 

-. 180E-02 

. 193E-02 

270. 

5.84 

8.89 

u.o 

. 665E-02 

-. 139E-02 

. 193E-02 

300. 

6.29 

9.57 

11.8 

. 599E-02 

* . 124E-02 

. 193E-02 

350. 

7.00 

10.7 

13.1 

•517E-02 

-. 103E-02 

. 193E-02 

Fig.  E-4  a  vs  a  analysis 


HIT/MISS  POD  ANALYSIS 
LOGNORMAL  MODEL 
VERSION  2.3 


DATE:  30-JUL-90 

IDENTIFICATION:  FILE  *  PADMOD.PF 

DATA  SET  -  SET2FPI 

INSPECTIONS  *  1  2  3 

6  9 


NUMBER  OF  VALID  CASES:  36 
CRACK  SIZE  RANGE:  8 . 0  TO  275.0 

THRESHOLD:  .5 


A50« 


MAXIMUM  LIKELIHOOD  ESTIMATES: 

MU-HAT  -  4.62  SIGMA-HAT  * 

PERCENTILE  ESTIMATES: 

101.  A90/50-  227.  A90/9S* 


.630 


.  7  30E-04 


ESTIMATED  VARIANCE/COVARIANCE  MATRIX  OF  THE 
MAXIMUM  LIKELIHOOD  ESTIMATES: 

MU-HAT  SIGMA-HAT 

MU-HAT  .  286E-01  .466E-02 

SIGMA-HAT  .  466E-02  .483E-01 


Fig.  E-5  Hit/miss  analysis 


PROBABILITY  OF  DETECTION  PROBABILITY  OF  DETECTION 


E-6 


CRACK  DEPTH  (MILS) 


Fig.  E-6  POD  (a)  for  a  vs  a  analysis 


CRACK  LENGTH  (MILS) 


Fig.  E-7  POD  (a)  hit/miss  analysis 
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Methods  to  quantify  NDI  reliability  and  capability  have  been  evolving  for  over  twenty-five  years. 
Initial  attempts  were  qualitative  rather  than  quantitative.  With  the  advent  of  damage  tolerance 
methodologies,  it  has  become  imperative  to  express  more  accurately  prouU'ility  of  detection  for  a 
given  inspection  method  and  inspectior  system.  This  Lecture  Series  is  aimed  at  providing  a 
methodology  to  quantify  probability  of  detection.  This  methodology  includes,  but  is  not  limited  to, 
design  of  experiments,  specimen  generation  and  maintenance,  statistical  analyses,  data  reduction 
and  presentation,  evalration  of  inspection  results  in  retirement  for  cause  decisions,  and  the 
procedure  required  to  establish  a  reliable  probability  based  inspection  for  detecting  anomalies  in 
engine  parts.  The  material  to  presented  is  applicable  to  civil  as  well  as  military  aircraft  and 
turbine  engine  manufacturing  and  maintenance  organizations.  The  lectures  will  examine  the 
detection  capabilities  of  fluorescent  penetrant  inspection,  eddy  current,  ultrasonic,  and  magnetic 
particle  inspection. 

This  Lecture  Series  incorporates  lessons  learned  in  the  design  of  experiments  to  validate  NDE/ 
ND!  systems  and  in  the  interpretation  of  the  results  of  these  experiments.  Samples  of  specimens 
used  in  NDE/ND1  reliability  programmes  will  be  available  for  inspection  by  attendees.  The 
Lecture  Series  also  includes  examples  to  help  with  the  understanding  of  design  of  experiments  and 
the  statistical  modelling  for  probability  of  detection  analyses. 

This  Lecture  Series,  sponsored  by  the  Structures  and  Materials  Panel  of  AGARD,  has  been 
implemented  by  the  Consultant  and  Exchange  Programme. 
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Aucun  stock  de  publications  n'a  existe  a  AGARD.  A  partir  de  1 993,  AGARD  detiendra  un  stock  limite  des  publications  associees  aux  cycles  de 
conferences  et  cours  speciaux  ainsi  que  les  AGARDographies  et  les  rapports  des  groupes  de  travail,  organises  et  publies  a  partir  de  1 993  inclus. 
Les  demandes  de  renseignements  doivent  etre  adressees  a  AGARD  par  lettre  ou  par  fax  a  1'adresse  indiquee  ci-dessus.  Veuillez  nepastelephoner. 
La  diffusion  ini  tiale  de  tou  tes  les  publications  de  I’AGARD  est  effectuee  aupres  des  pays  membres  de  I’OTAN  par  I’intermediaire  des  centres 
de  distribution  nationaux  indiques  ci-dessous.  Des  exemplaires  supplementaires  peuvent  parfois  etre  obtenus  aupres  de  ces  centres  (a 
I'exception  des  Etats-Unis).  Si  vous  souhaitez  re^evoir  toutes  les  publications  de  I’AGARD,  ou  simpiement  celles  qui  concernent  certains 
Panels,  vous  oouvez  demander  a  etre  inclu  sur  la  liste  d’envot  de  Tun  de  ces  centres.  Les  publications  de  l’AGARD  sont  en  vente  aupres  des 
agences  indiquees  ci-dessous,  sous  forme  de  photocopie  ou  de  microfiche. 
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Le  centre  de  distribution  national  des  Etats-Unis  (NASA/Langley)  ne  detient  PAS  de  stocks  des  publications  de  I’AGARD. 

D'eventuelles  demandes  de  photocopies  doivent  etre  formulees  directement  aupres  du  NASA  Center  for  Aerospace  Information 
(CASI)  a  1’adresse  suivante: 


NASA  Center  for 
Aerospace  Information  (CASI) 
P.O.  Box  8757 

BWI  Airport,  Maryland  2 1 240 
United  States 
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Les  demandes  de  microfiches  ou  de  photocopies  de  documents  AGARD  (y  compris  les  demandes  fattes  aupres  du  CASI)  doivent  compottcr 
la  denomina  lion  AGARD,  ainsi  que  le  numero  de  serie  d’AG  ARD  (par  exemple  AGARD-AG-3 1 5).  Des  informations  analogues,  telles  que 
le  litre  e*  !a  date  de  publication  sont  souhaitables.  Veuiller  noter  qu’il  y  a  lieu  de  specifier  AGARD-R-nnn  et  AGARD-AR-nnn  lors  de  la 
commande  des  rapports  AGARD  et  des  rapports  consultatifs  AGARD  respectivement.  Des  references  bibliographiques  completes  ainsi 
que  des  resumes  des  publications  AGARD  figurent  dans  les  joumaux  suivants: 


Scientifique  and  Technical  Aerospace  Reports  (STAR) 
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